AD>A055  902 


UNCLASSIFIED 


SOFTECH  INC  WALTHAM  MASS  F/S  9/2 

operational  software  concept  OSC  EXECUTIVE  EVALUATION/REFINEMCN— ETC (U) 
AUG  77  M 6 WILLOUGHBY*  C K HITCHON  F33615-76-C-1192 


102S-3 


AFAL-TR-77-87 


Iof2 

£8^002 

m 

1 : 

-1 

•J  <»»«**'  * 

W/f.V 

' 

■ 

L - 

•ju 

■ 

— 

• 

} 

) 

L 

- 

\ 

r 

r 

- 

1 /T 3l  / IMPORT  DOCUMENTATION  PAGE 

READ  INSTRUrnONS 

BEFORE  COWPI.ETINC  FORM 

vSSEY5E3iR==\  ; 

^AL  fr^77-87  \ ^ 

2.  GOVT  ACCEUION  NO. 

^ 

1.  recirient*s  catalog  number 



/OPERATIONAL NETWARE  i^DNCEPT 

5>ECUTI\/f'E)/ALUATTl)N/REFINEMENT 


chae1  G.IWIHoughby  flM  Carl  K.lHitchon 


1- 


« PEnroRMINO  ORGANIZATION  NAME  AND  ADDRESS 

SofTech,  Inc. 

460  Totten  Pond  Road 
Waltham.  Massachusetts  02154 

I I CONTROLLING  OmCC  NAME  ANO  AOORCSS 

Air  Force  Avionics  Laboratory 
System  Technology  Branch  (AAT) 

-Pattprson  AFR,  Ohin  4M.TT 


12.  report  date 

June  1977 


U '‘Mioti?ln^RINC  AGENCY  NAME  A ADDREWII  Jlll»r»«l  tram  ConlfolllwA  Olllf) 


IS.  MUMRCR  or  PAGES 

145 


AtWR«S/l<  MlltrmnI  tram  ConlrallbiS  OWc 


U.  SECUPITV  CLASS,  (ol  thta  fport) 

Unclassified 


ts«.  O^LMSIFIC  ATION/OOWNGPADIN G 
fCHCOULC 


If  mSTRlRuriON  STATEMENT  (ot  Report) 

Approved  for  public jmQ ease;  distribution  unlimited. 


lie  Mtlease;  distribution  un' 

4T  fof  fA#  iflWtPfUMuMAi  Jfi  II  M ffWWll 


19.  KEY  WORDS  fConifnu*  on  r«v«r««  otilo  If  nocotonry  m^d  Identify  by  block  nuoibor) 

Modular  Software  Avionics  Software 

Directed  Flow  Graphs  System  Software 

Support  Software  Executive  Software 

Higher  Order  Language  Software  Design 


20  ABSTRACT  fContlnuo  on  revoroo  midm  It  nococoofy  ontf  td*nUty  by  block  minborj 

/'This  report  describes  executives  built  using  the  Operational  Software  Con- 
cept (OSC).  These  executives  are  designed  to  operate  on  a federated  net- 
work of  four  DAIS  processors  connected  by  DAIS  multiplex  data  busses.  In 
fact,  the  executives  and  applications,  represented  by  stubs,  have  been  im- 
plemented on  a two  processor  system. 

The  applications  supported  by  the  executives  are  specified  using  Directed 
Flowgraphs  (DFG)  as  described  in  Technical  Report  AFAL-TR-74-168.  Volume  II 
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from  OSC  Phase  I.  It  is  assumed  that  the  reader  of  this  report  will  bo 
familiar  with  the  contents  of  the  above-mentioned  document. 

The  executives  have  been  built  based  on  the  design  presented  In  the  OSC  Com- 
puter Program  Development  Specification,  Technical  Report  for  April  1976  to 
September  1976.  The  baseline  executives  were  coded  on  the  DAIS  laboratory 
PDP-10,  primarily  in  J731,  providing  full  generality  for  the  DFG  specified 
application.  The  DFG  was  provided  by  AFAL  as  representative  of  a DAIS 
mission.  These  executives  had  time  overheads  of  37.1%  and  41.0%  and  space 
overheads  of  36.5%  and  33.7%  for  processors  0 (the  head  processor)  and  1, 
respectively.  The  baseline  executives  were  tuned  primarily  for  High  Order 
Language  (HOL)  Inefficiencies,  HOL  deficiencies  and  generality  reduction 
for  the  specific  DFG.  The  resulting  final  tuned  executives  had  time  over- 
heads of  11.1%  and  11.3%  and  space  overheads  of  22.1%  and  19.2%  for  proces- 
sors 0 and  1,  respectively.  These  overhead  figures  and  the  detailed 
statistics  presented  In  the  report  represent  the  state  of  the  executives  on 
20  December  1976  and  are  based  on  the  DAIS  processor  described  in  Specifica- 
tion Number  MN255R817-1  of  September  1976  and  the  DAIS  multiplex  data  bus 
described  In  Specification  Number  SA301300B-15  of  March  1976. 

This  report  is  divided  Into  three  sections  and  an  Appendix.  The  first  sec- 
tion describes  the  process  of  building  an  executive  based  on  a DFG.  The 
second  section  describes  the  parameters  affecting  system  performance  that 
are  associated  with  the  DFG  supported  by  the  executives.  The  third  section 
presents  the  baseline  executive  statistics,  tuning  method  descriptions  and 
statistics  for  the  final  tuned  executive.  The  Appendix  provides  program 
listings  of  Intermediate  tuning  results  of  the  final  tuned  executives. 
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Section  I 


BUILDING  AN  EXECUTIVE 

Building  a baseline  (untuned)  OSC  executive  for  a particular  mis- 
sion is  a two-step  process.  First,  the  formal  Directed  Flowgraph  (DFG) 
which  specifies  the  mission  must  be  studied,  simplified  if  possible,  and 
converted  into  an  executive  DFG  model  which  can  be  translated  directly 
into  an  executive  data  set.  This  process  is  described  in  detail  in  Section 
l.^. 

In  the  second  step,  the  resulting  executive  data  set  is  edited  into 
the  appropriate  COPY  and  COMPOOL  source  files  as  preset  data.  The 
baseline  mission  executive  is  then  created  by  performing  the  proper  se- 
quence of  compilations  and  link  edits.  This  process  is  described  in  detail 
in  Section  1.1. 

The  resulting  executive  is  fully  functional  but  untuned.  The  pro- 
cesses of  tuning  and  analysis  of  the  results  are  described  in  detail  in 
Section  3. 

1.  1 Construction  of  Baseline  Executives 

The  baseline  execvrtives  for  processor  0 and  processor  1 can 
be  created  froni  files  residing  on  PPN  [1111,351]  of  the  DAIS  PDP-10. 
Special  file  naming  conventions  have  been  used  to  aid  in  the  identification 
of  files  and  construction  of  the  executives.  These  conventions  will  be 
described  in  the  first  subsection. 

The  creation  of  an  executive  is  a three-step  process.  First, 
all  the  required  source  files  are  moved  to  a separate  PPN.  Then  they 
are  compiled,  assembled  and  reformatted.  Finally,  they  are  linked  to 
create  the  desired  executive.  The  executives  are  named  CSCO.LDA 
for  processor  0 and  CSCl.LDA  for  processor  1.  Each  of  these  processes 
is  aided  by  a set  of  submit  files  on  PPN[l  111,  420].  These  files  are  pre- 
pared batch  jobs  to  make  the  process  easier  and  more  reliable.  The 
second  subsection  describes  the  necessary  submit  files,  the  final  sub- 
section describes  how  they  are  used  to  create  the  executiv'e. 
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1.  1.  1 File  Name  Conventiona 


Each  file  name  consists  of  two  parts  separated  by  a dot.  The 
name  before  the  dot  is  used  to  indicate  the  type  of  information  it  contains; 
that  is,  it  indicates  whether  the  file  contains  executive  procedures,  data 
declarations,  procedure  declarations  or  initialization  procedures.  The 
name  after  the  dot,  the  extension,  indicates  the  executives  the  file  is  used 
to  build,  and  whether  it  is  a J731  program,  DAIS  assembly  language  pro- 
gram, copy  file,  COMPOOL  source  or  preprocessed  COMPOOL.  A 
summary  of  the  conventions  is  given  in  Figure  la.  The  use  of  a "?"  within 
a name  indicates  that  the  appropriate  letter  is  substituted  for  each  "?  " to 
name  the  requested  file.  For  example,  the  designation  ? ? ? IPD  indicates 
that  names  such  as  ITCIPD,  DACIPD,  and  MSMIPD  are  valid. 


File  Type 

Common 

Name 

Processor  0 
Name 

Processor  1 
Name 

J73I  programs 

. J73 

. J70 

. J71 

DAIS  Assembly 
language  programs 

. DAL 

. DAO 

. DAI 

copy  files 

. CPY 

. CPO 

. CPI 

COMPOOL  source  files 

. CPS 

. CSO 

. CSl 

Preprocessed  COMPOOL 
(one  for  each  COMPOOL  file) 

. CMP 

. CMP 

. CMP 

Executive  procedures 

? ? ? PR  C. 

? ? ?PRC. 

? ? ?PRC. 

Initialization  procedures 

? ? ?INT. 

? ? ?INT. 

? ? ?INT. 

Data  declarations 

? ? ?DCL. 

? ? ?DCL. 

? ? ?DCL. 

Separate  copy  of 
initial  data 

? ??IDT. 

? ? ?IDT. 

? ? ?IDT. 

External  procedure 
declarations 

? ? ?EPD. 

? ? ?EPD. 

? ? ?EPD. 

Internal  procedure 
declarations 

? ? ?IPD. 

? ? ?IPD. 

? ? ?IPD. 

Figure  la.  FILE  NAME  CONVENTIONS 
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1,  1.  2 Submit  Files 


A set  of  submit  files  resides  on  PPN[  11 11,420]  to  assist  in  con 
structing  an  executive.  The  following  list  names  and  describes  each 
of  them. 


MVOSCO.CTL 

copy  all  files  in  PPN[llll,35l]  needed 
to  create  the  processor  0 executive. 

Rename  all  . CPO,  . CSO,  . J70  and  . DAO 
extensions  to  .CPY,  .CPS,  .J73  and  .DAL. 

MVOSCl.CTL 

copy  all  files  in  PPN  [ 1111,35 1]  needed 
to  create  the  processor  1 executive. 

Rename  all  .CPl,  .CSl,  .J71  and  . DAl 
extensions  to  .CPY,  .CPS,  .J73  and  .DAL. 

PRC.CTL 

- 

compile  all  executive  J73  programs. 

PD.CTL 

- 

compile  all  procedure  declaration  COMPOOL 
files. 

CMP.CTL 

- 

compile  all  COMPOOL  files  that  are  not 
procedure  declarations. 

FMT.CTL 

- 

reformat  all  executive  .REL  files  into 
.DAT  files 

ASSMBL.CTL 

- 

assemble  all  executive  DAIS  A'lsembly 
Language  files. 

LINKO.CTL 

- 

create  the  processor  0 executive  OSCO.LDA. 

LINKl.CTL 

- 

create  the  processor  1 executive  OSCl.LDA. 

1.  1.  3 Executive  Construction 

This  description  will  assume  the  processor  0 executive  is  being 
created.  The  processor  1 executive  is  created  in  the  same  way,  with  the 
appropriate  submit  files  substituted.  One  executive  should  be  completed 
before  the  next  is  begun.  It  is  also  important  that  the  commands  be  per- 
formed in  the  order  listed  to  insure  that  the  proper  files  are  used.  It  is 
also  important  that  each  command  complete  before  the  next  one  begins. 

The  first  step  in  building  the  executive  is  to  copy  the  appropriate 
files  to  a different  PPN.  This  can  be  done  with  submit  files  residing  on 
[1111,420].  These  submit  files  will  copy  to  whatever  PPN  the  submit 
file  is  on.  The  following  command  will  copy  the  necessary  files: 

. SUBMIT  MVOSCO  (MVOSCl) 

All  of  the  required  source  files  have  now  been  copied.  The  COMPOOLs 
and  programs  must  be  compiled  and  assembled  next. 

. SUBMIT  CMP 
.SUBMIT  PD 
. SUBMIT  PRC 
. SUBMIT  ASSMBL 

Then  the  files  are  reformatted  and  linked  to  create  the  executive. 

. SUBMIT  FMT 

.SUBMIT  LINKO  (LINKl) 

The  resulting  executive  will  be  named  OSCO.  l,nA  (OSCl.LDA). 
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Construction  of  the  Directed  Flowgraph  Model 


1.  2 

1.  2.  1 Approach 

The  process  of  constructing  an  OSC  DfG  model  begins  when  the 
n^ission  planner  has  produced  a complete  specification  of  the  mission's 
executive  requirements  in  the  form  of  a fornial  PFG  together  with  supple- 
mentary information.  The  formal  DI-'G  precisely  depicts  all  data  and 
control  relationships  between  the  mission's  tasks,  sources,  and  sinks. 
Svipplementary  information  provides  task  execution  rates  and  times, 
data  link  descriptions,  meniory  and  response  requiren^ents,  and  any 
other  quantitative  aspects  of  8ysten\  perforniance  or  resource  require- 
ments which  may  influence  construction  of  the  executive  PKG  model. 

f^ach  ntission  task  depicted  on  the  formal  PP'G  is  considered 
as  an  indivisible  vmit.  The  internal  operations  of  these  tasks  are  not 
specified  in  the  formal  DP'G  and  are,  in  fact,  only  pertinent  in  their 
effects  on  the  supplementary  quantitative  information. 

The  job  of  the  executive  builder  is  to  construct  an  OSC  executive 
which  accvirately  models  the  formal  DFG  (in  that  the  data  and  control 
flow  will  be  as  specified)  and  which  also  n^eets  the  quantitative  requirements 
(such  as  execution  rates,  response  tinges,  etc,  I on  the  target  hardware 
configuration.  More  specifically,  since  the  OSC  executive  is  a table-driven 
OP'G  interpreter,  the  job  of  the  executive  builder  is  to  construct  a set  of 
tallies  which  will  drive  the  executive  correctly  and  meet  the  cjuautitative 
requirements. 

One  approach  to  this  problem  is  to  perform  a one-to-one  mapping 
of  the  formal  DFG's  links,  nodes,  sources,  etc.  into  the  corresponding 
models  supported  by  the  OSC  executive.  It  is  quite  possible  that  such  a 
"brute  force"  translation  of  the  formal  DP’G  into  an  executive  data  set 
could  be  performed  automatically  by  a translator  program  driven  by  a 
formalized  lingviistic  representation  of  the  DFG.  This  approach  has  a 
serious  drawback.  A direct  translation  of  a relatively  complex  DP'G  may 
result  in  an  exectitive  which  is  so  large  .ind  slow  that  space  and  execiition 
time  requirements  of  the  mission's  tasks  cannot  be  met.  Moreover,  since 
such  an  automatic  translator  does  not  exist,  a tedious,  error-prone  hand 
translation  must  be  performed. 
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I’or  these  reasons,  the  executive  builder  may  be  forced  to  redhce  !: 

the  complexity  of  the  DFG  prior  to  final  translation  into  an  executive  data  ;■ 

set.  The  need  to  reduce  complexity,  however,  does  not  negate  the  useful-  i 

I 

ness  of  an  automatic  translator.  Even  executive  data  sets  produced  by  ) 

the  translator  which  turn  ovit  to  be  too  inefficient,  can  be  made  useful  tools  j 

for  debugging  the  DFG  by  using  task  stubs  and  running  virtual  clocks  at 

a slower  rate.  !; 

i! 

1.  2.  2 Reduction  of  DFG  Complexity 

The  reduction  of  the  complexity  of  a detailed  forn^al  DFG  takes 
place  in  successive  stages.  At  each  stage,  a less  complex  model  of  the 
DFG  results.  Each  successive  niodel  must  meet  the  data  and  control 
constraints  of  the  original  formal  DFG  while  also  more  closely  approach- 
ing a final  DFG  model  whose  run  time  overhead  is  low  enough  to  leave 
sufficient  resources  for  the  mission  tasks.  The  process  of  complexity 
reduction  need  not  stop  when  this  point  is  reached.  Indeed,  continuing 
will  further  reduce  the  bulk  of  error  prone  translation  required  and  also 
reduce  executive  overhead  to  allow  for  future  expansion  of  mission  re- 
source requirements. 


The  simplifications  achieved  at  each  stage  of  DFG  complexity 
reduction  may  result  from  any  of  a number  of  DFG  tuning  methods 
(described  below).  Specific  examples  of  each  tuning  method  as  applied 
to  the  DAIS  mission  DFG  are  provided  along  with  the  method  descriptions. 
Details  of  the  final  tuned  version  of  the  DAIS  DFG  are  presented  in  Sec- 
tion 1.  2.  3.  The  notation  used  in  the  final  DFG  is  not  strictly  formal  DFG 
notation  but  is  an  adaptation  biased  toward  the  actual  OSC  executive  model- 
ing of  DFG  objects.  A key  to  this  notation  is  also  presented  in  Section 
1.  2.  3.  Although  only  the  final  result  of  the  DFG  tuning  process  is  diagram- 
med in  section  1,  2.  3,  several  intermediate  tuned  versions  were  sketched 
during  the  tuning  process.  'rhese  intermediate  versions  are  includi'd  in 


Appendix  R. 


The  DFG  complexity  reduction  process  (i.  e.  , DP'G  tuning)  is  dis- 
tinct from  tuning  of  the  executive  programs  and  their  data  structures. 
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However,  there  is  an  important  interaction  to  consider.  Tuning  of  the 
DF'C;  may  result  in  a model  which  requires  considerably  less  than  the  full 
functionality  provided  by  the  baseline  executive.  In  anticipation  of  this 
situation  the  functional  features  of  the  OSC  execvitive  were  designed  and 
inipleniented  in  a modvilar  fashion  to  n\ake  their  deletion  a simple  matter 
of  excluding  certain  procedures  or  data  structures  from  the  executive 
built.  It  is  worth  noting  that  tuning  of  the  formal  DAIS  DF'G  resulted  in 
a significant  reduction  in  the  various  executive  functions  required.  In 
fact,  further  tuning  of  the  DFG  was  deliberately  avoided  in  order  to  retain 
examples  of  all  Important  executive  capabilities. 

1.  2.  2.  1 Preemption  Reduction 

The  execution  of  two  tasks,  A and  B,  is  said  to  be  interleaved  if 
part  or  all  of  task  A is  executed  after  task  B begins  execution  but  before 
B completes  or  vice  versa.  The  execution  of  tasks  A and  B is  further 
said  to  be  concurrent  if  at  any  tin^e  both  A and  B are  in  execution.  Con- 
current execution  can  occur  only  in  configurations  containing  more  than 
one  processor.  Examples  are  multiprocessor  systems  where  memory 
is  shared,  and  federated  processor  systems  where  a data  transport  re- 
source is  shared.  In  a uniprocessor  system,  only  interleaved  execution 
of  tasks  is  possible.  In  particular,  each  processor  in  a federated  con- 
figuration is  a uniprocessor  in  its  own  context.  In  a uniprocessor  system, 
interleaved  execution  occvirs  when  one  task  preenipts  another,  i.  e,  , when 
the  execution  of  one  task  is  temporarily  suspended  to  allow  execution  of 
a more  urgent  task. 

A formal  DFG  may  impose  many  constraints  on  task  preemption. 
For  example,  in  the  DFG  below,  task  B must  complete  execution  before 
task  A can  be  execvited. 


In  the  next  DFG,  execution  of  tasks  A,  B,  and  C will  be  nnitually  exclusive. 


For  other  tasks,  the  DFG  may  impose  few  constraints  or  none  at 
all.  The  purpose  of  the  formal  DFG  is  to  specify  exactly  those  data  and 
control  constraints  that  are  required  for  proper  system  operation  and 
no  more.  Since  all  data  and  control  relationships  among  the  tasks  are 
specified  on  the  formal  DFG,  any  task  execution  policy  satisfying  the 
DFG  constraints  will  perform  correctly. 

The  executive  builder's  options  in  construction  are  proportional 
to  the  level  of  detail  on  the  formal  DF'G  specification.  The  further  eacli 
task  is  broken  down  into  smaller  tasks,  the  easier  it  is  to  isolate  the 
sources  of  contention  for  resources. 

The  purpose  of  permitting  preemption  of  one  task  by  another 
within  a uniprocessor  system  is  only  to  satisfy  response  r equir en\ents. 
That  is,  a task  with  short  response  requirements  may  bo  required  to  run 
at  a time  when  another  task  is  already  running.  The  task  already  in 
execution  may  take  so  long  to  complete  that  the  otlier  task's  response 
requirement  cannot  be  met  if  the  executing  task  is  allowed  to  run  to 
completion. 

The  redviction  or  elimination  of  preemption  requirements  is  tlie 
single  most  effective  method  for  reducing  the  complexity  of  the  DFG 
model.  The  reason  for  this  is  that  tasks  whicli  cannot  preempt  one 
another  can  access  the  san\e  global  data  withovjt  a contention  problen\ 

(e.  g.  , one  task  reading  the  data  while  the  other  is  writing  it).  If  tasks 
which  must  preempt  one  another  also  contend  for  global  data,  then  the 
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executive  must  coordinate  access  to  that  data.  Consider  the  example 
in  Figure  1.  Task  A writes  asynchronously  into  storage  node  B, 
task  C synchronously  updates  B,  and  tasks  D and  E asynchronously  read 
B.  Now  suppose  task  A is  allowed  to  preempt  task  C.  A may  then  change 
the  contents  of  B while  C is  running.  Hence  C must  be  provided  with  a 
separate  copy  of  the  data  in  B.  In  a similar  way,  if  C can  preempt  D, 
then  while  D is  running,  C may  write  new  data  into  B.  Hence  D must 
be  provided  with  a separate  copy  of  B's  data. 
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The  OSC  executive  provides  the  facilities  of  the  DAC  (Data  Access 
Control)  cluster  to  manage  such  situations.  There  are  basically  two  ways 
of  handling  these  problems  with  DAC,  one  in  which  the  data  is  statically 
allocated  (fixed  locations  for  data  buffers)  and  one  in  which  the  data  space 
is  dynamically  allocated  (locations  for  data  buffers  are  determined  at  run 
time).  Which  solution  is  better  depends  upon  the  length  of  the  data  and 
the  execution  rates  of  the  accessing  tasks.  In  the  static  method  the  "current" 
data  for  B is  always  contained  in  a global  statically  allocated  storage  block 
(SSB).  Each  tinie  a task  which  reads  B is  scheduled  for  execution,  the 
executive  copies  the  data  in  B to  a local  SSB  which  the  task  accesses  instead 
of  the  global  copy.  Each  task  which  writes  into  B writes  instead  into  its 
own  local  copy  of  B and  the  executive  copies  the  data  into  the  global  SSB 
when  the  task  completes  (see  Figure  1 ).  In  the  dynamic  method,  static 

storage  blocks  (SSBs)  are  replaced  by  dynamic  storage  blocks  (DSBs)  and 
only  pointers  to  the  DSBs  are  copied  by  the  executive. 

Clearly,  considerable  executive  overhead  may  be  required  to  perform 
such  control  of  data  access.  On  the  other  hand,  if  tasks  A,  C,  D and  E 
in  Figure  1 are  not  allowed  to  preempt  one  another,  then  each  task 
will  have  exclusive  access  to  B while  it  executes.  In  this  case,  no  conten- 
tion problem  exists,  and  each  task  may  directly  reference  the  single  global 
block  B. 

Another  problem  associated  with  preemption  is  that  of  application 
programs  which  are  shared  by  two  or  more  application  tasks.  If  two  tasks 
which  can  preempt  one  another  invoke  the  same  subroutine,  then  that  sub- 
routine must  be  reentrant.  Reentrantcy  is  usually  obtained  at  some  coat 
in  efficiency,  and  in  the  case  of  J73I  an  error  prone  program  controlled 
stack  management. 

In  summary,  the  key  factors  determining  the  need  for  preemption 
are  the  response  required  for  task  execution  and  the  maximum  individual 
task  execution  times.  In  general  preemption  can  be  avoided  when  the 
maximum  task  execution  time  is  small  relative  to  most  severe  response 
requirements.  For  this  reason,  it  is  important  that  the  mission  planner 
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specify  the  system  with  the  most  detailed  OFG  practical.  In  the  DAIS 
mission  DFG,  the  maxinMim  exemption  time  of  a single  task  was 
approximately  6 milliseconds  while  the  tightest  response  requirement 
was  approximately  30  milliseconds.  Even  under  maximum  load  condi- 
. tions,  it  was  possible  to  meet  the  response  requirements  without  allow- 
ing any  preemption.  However,  in  order  to  demonstrate  the  executive 
handling  of  preemption,  the  longest  task  combination,  AT20  on  the 
final  DFG  with  execution  time  8 milliseconds,  was  made  preemptable. 

As  a consequence,  data  selector  21  and  the  link  data  coming  into  AT20 
had  to  be  put  under  executive  management. 

1.  2.  2.  2 Combining  Tasks 

Another  important  means  of  simplifying  the  DFG  is  to  combine 
tasks  on  the  formal  DFG  into  larger  tasks.  This  is  accomplished  by 
writing  a skeletal  master  task  which  simply  calls  each  task  in  the  com- 
bination as  a subroutine.  Since  subtasks  within  such  a task  combination 
are  executed  seqvientially , there  is  no  mutual  preeniption.  There  are 
several  benefits  to  be  gained  by  combining  tasks: 

• Executive  schedviling  overhead  is  reduced  since  one 
scheduling  of  the  combined  task  is  equivalent  to 
scheduling  all  subtasks. 

• Executive  table  space  required  for  the  combined  task 
is  the  same  as  the  space  otherwise  reqviired  for  one 
subtask. 

• The  source/sink  requiren\ents  of  each  subtask  are 
combined  resulting  in  fewer  calls  to  DTE,  a reduction 
in  the  number  of  access  controllers  required  and 
batching  of  I/O  operations. 

• Control  signals  which  individually  activate  each  sub- 
task are  combined  into  a single  control  signal. 

The  benefits  of  combining  tasks  must  be  realized  while  adhering 
to  the  DFG  specification.  Tasks  may  be  combined  without  deviating  from 
DTE  requirements  if  they  are  executed  under  the  same  conditions,  for 
example,  tasks  which  are  controlled  by  the  same  clock  or  control  link 
(through  an  identity).  In  such  task  combinations,  the  skeletal  master 
task  simply  calls  each  subtask  in  any  order.  Tasks  which  are  connected 
to  one  another  by  simple  data  links  may  be  combined.  In  these  task  com- 
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binations,  the  skeletal  master  task  calls  each  subtask  after  all  the  sub- 
tasks providing  input  links  to  that  subtask  have  been  executed.  Calls  to 
svibtasks  in  the  master  skeletal  task  are  carefully  ordered  to  reflect  link 
imposed  execution  order.  Finally,  tasks  interconnected  by  control  selec- 
tors may  be  combined.  The  control  selector  nodes  which  connect  the  tasks 
are  implemented  in  the  skeletal  master  task  as  if-then-else  statements 
which  call  each  subtask  when  the  corresponding  control  selector  predicate 
indicates. 

In  many  task  combinations  created  for  the  DAIS  DFG,  subtasks 
which  are  conditionally  executed  produce  output  to  devices.  A new  soft- 
ware signal  capability  (via  procedure  SIGAC)  was  added  to  allow  the 
skeletal  master  tasks  to  conditionally  signal  that  one  or  more  output 
operations  be  performed  when  the  task  combination  completes. 

Large  combinations  of  tasks  can  result  in  an  increase  in  the 
maximum  task  execution  time  to  a point  where  some  task  combinations 
must  be  made  preemptable  in  order  to  meet  response  requirements. 

This  can  result  in  an  increase  in  executive  data  access  control  overhead 
greater  than  the  overhead  saved  by  combining  the  tasks.  Hence,  task 
combinations  which  complicate  meeting  response  requirements  should 
be  avoided.  In  the  DAIS  mission  DFG,  such  combinations  were  avoided 
except  in  the  case  of  AT20.  AT20  is  the  longest  executing  task  combina- 
tion (8  milliseconds)  and  was  made  preemptable  for  the  purpose  of  demon- 
strating the  executive's  preemption  capability. 

Some  examples  of  task  combinations  in  the  tuned  DAIS  DFG  are 
ATOl  which  combines  most  of  the  tasks  and  control  selectors  involved  in 
handling  asynchronous  pilot  inputs,  and  AT  18  which  conibines  several 
tasks  which  must  run  at  the  8/sec  rate.  Complete  details  of  the  tuned 
DAIS  DFG  are  presented  in  Section  1.2.  3.  Details  of  the  flow  of  control 
within  each  task  combination's  master  skeletal  task  are  specified  in  the 
J73I  program  APPRC.  .170  for  processor  0 and  in  APPRC.  ,T71  for  proces- 
sor I . 
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1.  1,  1.  3 Serij{i7or s 

The  soriali'/cr  nodes  drawn  on  a PFCI  may  liave  quite  complex 
implications.  In  the  worst  case,  it  implies  a nun\ber  of  input  links  con- 
taining data  which  n\ust  he  queued  by  the  execvitive  as  they  become  enabled 
and  then  serially  removed  from  the  queue  and  copied  to  the  output  link. 
However,  in  actual  practice,  this  full  functionality  of  the  serialirer  may 
not  be  required. 

In  the  simplest  case  a seriali/er  u\ay  iniply  enabling  a particular 
link  whenever  any  tnie  of  a numb<>r  of  mutually  exclusive  links  is  enabled. 

In  such  cases  no  queueing  is  required  and  eacli  input  link  to  the  serializer 
may  be  repU\ced  by  the  single  output  link,  F.xainples  of  such  serializers 
on  the  DAIS  PFG  abound.  The  serializers  which  control  the  iterative  tasks 
T53  and  T55  on  page  313  of  the  DFG  are  good  examples  as  are  the  serializers 
on  the  same  page  which  accept  data  froni  each  loop  output. 

A slightly  more  coniplicated  case  is  a control  serializer  with  inputs 
which  are  not  mutually  exclusive.  .Again  executive  queueing  is  easily 
avoided  by  allowing  the  wait  count  for  the  task  connected  to  the  output 
link  to  be  decremented  below  zero.  When  the  task  conipletes  it  is  resched- 
uled if  its  recomputed  wait  count  is  still  less  than  or  equal  to  zero.  On 
the  D.MS  DFG  exaniples  of  svich  tasks  are  T3b  and  T22  (AT36  and  AT22 
on  the  tuned  DFG). 

Finally,  there  is  a more  difficult  type  of  data  serializer  where  again 
mutual  exclusion  is  not  obvious  froni  the  DFG.  Here  it  is  possible  to  guar- 
antee serialization  by  including  the  serializer's  output  task  in  a separate 
task  combination  with  each  input  task  to  tlie  serializer  and  disallowing  mutual 
preemption  of  these  task  combinations.  It  is  important  to  realize  that  this 
method  neither  implies  that  multiple  copies  of  the  outpvit  task  are  required 
nor  that  the  task  must  be  reentrant.  Kxamples  of  this  type  of  serializer 
implementation  in  the  tuned  DFG  are  tasks  ATOl,  .AT08,  and  ATOb  which 
all  may  call  the  same  subtask  (T08)  and  cannot  preempt  one  another.  In 
some  cases,  the  incUision  of  a subtask  in  two  or  niore  task  combinations 
has  forced  duplication  of  its  output  devices  so  that  a different  data  area 
can  be  used  for  each  instance. 
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1.  2.  2.  4 Complex  DFG  Constructs 

In  some  cases  complex  appearing  DFG  constructs  yield  to  quite 
simple  executive  implementation.  That  is,  in  some  cases,  the  OSC 
executive  can  model  a complex  combination  of  nodes  without  modeling 
each  individual  node.  Good  examples  of  this  type  of  simplification  are 
the  eight  clock  controlled  loops  on  page  3B  of  the  formal  DAIS  DFG 
specification.  These  loops  were  reduced  to  simple  combinations  of 
} gates  and  tasks  in  the  tuned  DFG.  The  control  selectors  which  monitor 

each  loop  have  been  subsumed  into  each  loop  task  and  are  manifested 
' as  software  signals  which  enable  and  disable  the  appropriate  gates. 

The  technique  of  combining  control  selectors  into  a combination 
with  the  task  which  produces  the  controlling  data  and  allowing  the  skeletal 
master  task  for  this  combination  to  produce  the  appropriate  software 
signals  was  used  throughout  the  tuned  DFG  to  eliminate  the  unnecessary 
overhead  of  treating  each  control  selector  as  a separate  node. 
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on  tlir  Irfl  niilr  of  p.inr  4A  of  tlir  foriii.il  Tlir  t.inU  ilr  «i' r ipl  ion  s pro- 

viilril  an  part  of  tlir  npri' it  ii  .1 1 ion  rrvr.il  lli.it  tlir  .-loftvva  fr  ni^nal.n  (."41  A,  ,S1  1 
rti'.)  i'otltrollltif{  tlirnr  t.i.nlin  .1  rr  iniitii.lllv  rxi  lnnivr.  Hri'.llinr  of  tliln  it  in 
ponnihlr  to  ri’pl.ii  r all  llionr  niKii.iln  with  .1  ninglr  ni^ii.il  anil  a il.ita  wofil 
wlili'li  nprrlfirn  wlilrli  ^i^nal  li.in  oir  ii  r r nl. 


The  associated  tasks  can  be  bound  into  a single  task  combination  signaled 
by  the  new  signal.  The  skeletal  task  for  the  combination  car.  use  the  data 
word  as  an  index  in  a SWITCH  statement  to  dispatch  control  to  the  appro- 
priate subtask. 


FORMAL  HFC. 


Another  example  of  a simplifying  signal/data  trade  involves  the 
tasks  at  the  top  of  page  4n  of  the  formal  PFCi.  Here,  many  signalled  con 
trol  links  select  an  output  device  for  T8tt.  Instead,  the  control  signals 
can  be  combined  into  a simple  bit  mask  which  is  updated  by  the  control 
tasks  and  read  bv  THb  to  select  the  proper  output  device. 


1.  2.  2.  5 Partitioning 


Many  factors  nwist  be  considered  in  partitioning  the  PKCi  anionu 
processors.  The  fact  that  the  formal  UPCi  was  desinnetl  for  a four 
processor  systenv  hut  had  to  be  partitioned  for  a two  processor  system 
jnade  careful  consideration  of  these  factors  critical.  These  factors  in- 
clude oAenAoi'y  usajje  balance  (for  both  executive  and  applications),  func- 
tional isolation  to  avoid  total  systen\  failure  in  the  event  of  a single 
processor  failvire,  processor  time  requirements,  transfer  rates  alonjj 
int er -p r oc e s sor  links,  distribution  of  source  sink  load,  an<l  allocation 
of  subaddresses.  Cienerous  detail  in  the  formal  DKCi  is  an  invportant 
aid  to  effective  partitioning. 

In  partitioning  the  DAIS  DPCI  all  these  factors  were  considered. 

The  result  consists  of  a well  balanced  two  processor  partitioning  of  an 
application  intended  for  four  processors.  Most  of  the  inter-processor 
communication  is  from  processor  0 to  processor  1.  Data  is  sent  fron\ 
processor  0 to  processor  1 each  tin^e  a data  selector  which  is  asynchroiv- 
ously  accessed  (at  a high  rate)  by  processor  1 is  updated  by  processor  0. 
Processor  0 also  requires  asynchronous  access  to  some  data  selectors 
in  processor  1,  namely  43,  44  and  S2.  I'hese  accesses  occur  at  a low 
rate  while  updating  in  processor  1 occurs  at  a higli  rate.  Hence,  ratlier 
than  sending  this  data  via  DTK  each  time  it  is  updated,  processor  0 sends 
a request  for  the  data  (requiring  transmission  of  one  data  word)  when  it 
is  needed.  Processor  1 then  responds  by  sending  hack  the  requested  data. 

1.  2.  2.  6 Other  Possible  Siiuplifications 

Recause  of  the  desire  to  demonstrate  variovis  execvitive  features, 
not  all  possible  simplifications  were  applied  to  the  formal  PAIS  DKO.  In 
particvilar,  n\ore  tasks  cotdd  have  been  combined  intt)  larger  tasks.  )•  or 
example,  tasks  running  at  one  rate  could  be  condnned  with  tasks  running 
at  a slightly  higher  rate.  .Although  such  tasks  would  be  executed  at  a 
higher  rate  than  required,  thus  consunAing  n\ore  processor  tin\e,  gains 
in  reduced  executive  overhead  obtained  through  elioAination  of  tasks  and 
clocks  nUght  more  than  compensate  for  this  apparent  inefficiency. 


Tasks  T89,  T90,  and  T91  present  an  instance  where  a signal/data 
trade  could  have  been  made  eliminating  two  task  nodes,  two  clock  pins, 
three  signals  and  three  gates.  In  this  simplification  the  gating  signals 
produced  by  TH7  woulcl  be  replaced  by  a data  worcl  which  selects  tlie 
appropriate  subtash  in  a combination  task  including  TH9,  T90,  and  T^l. 

1.2.3  PTnal  DFG  Mo<lel 

The  following  subsections  (1 . 2,  3.  1 -1.2.3.  4)  present  the  results 
of  the  DFG  complexity  reduction  process  as  applied  to  the  DAIS  mission 
DFG.  The  notation  used  is  not  the  formal  DF'G  notation  but  an  adaptation 
of  this  notation  which  corresponds  more  closely  to  the  OSC  executive 
constructs  which  model  a DFG.  The  meaning  of  each  symbol  used  is 
specified  in  the  subsection  which  follows. 

Appendix  B includes  working  DFG's  which  were  drawn  at  various 
stages  in  the  tuning  process.  The  final  tuned  version  which  follows  is 
the  result  of  repeated  application  of  all  the  tuning  techniques  described  in 
the  preceding  sections.  The  constructs  which  are  included  in  the  final 
tuned  version  are  only  those  which  requir*-  active  management  by  the  OSC 
executive  for  correct  DFG  operation.  Constructs  not  requiring  execvitive 
management  were  omitted  to  simplify  the  diagram. 

Although  all  of  the  tuning  methods  described  were  applied,  two  of 
them  account  for  most  of  the  simplification  achieved.  One  is  the  combina- 
tion of  tasks  which  arc  activated  under  identical  conditions  into  larger  tasks. 
This  tuning  method  is  described  in  Section  1.  2.  2.  2.  The  resulting  task 
combinations  created  for  the  Dais  mission  DFG  are  listed  in  Section  1.  2.  3.  4. 
The  other  frequently  used  tuning  technique  is  limiting  of  intertask  preemption. 
This  technique  is  described  in  Section  1.  2.  2.  1.  Althovigh  the  niission  re- 
sponse requirements  do  not  mandate  any  intertask  preemption,  preemption 
of  the  longest  executing  tasks  was  perniitted  in  order  to  denionstrate  the 
c'xecutive's  fvdl  c.ipaliililies.  1 he  pri'cmplion  striicliire  for  the  processor 
0 DFG  is: 


preempt 


preempts 


Tasks  which  cannot  preempt  one  another  can  share  access  to  data  selector 
Storage  nodes  without  executive  intervention.  It  is  for  this  reason  that 
only  a few  of  these  nodes  appear  in  the  final  tuned  DFG.  The  storage  nodes 
which  do  appear  are  exactly  those  whose  access  must  be  managed  by  the 
executive  to  avoid  contention  among  tasks  which  can  preempt  one  another. 

It  is  also  the  reason  that  several  tasks  appear  to  have  no  data  inputs  and/or 
outputs  (in  particular  task  combinations  AT55,  AT68,  AT81,  and  AT86). 

Some  of  the  task  nodes  in  the  final  DFG  do  not  appear  in  the  original 
formal  DFG.  These  nodes  are  special  purpose  nodes  which  aid  the  executive 
in  initialization  and  failure  detection.  They  are  described  in  detail  below. 

Special  Processor  0 Nodes 

1)  RFIN  - this  sink  node  when  activated  signals  completion 

of  initialization  of  both  processors  and  causes  the 
processor  0 executive  to  start  its  periodic  clocks 
running. 

2)  RCLK  - This  sink  node  is  activated  periodically  by  a 

clock  signal.  It  causes  a special  message,  which 
is  used  to  synchronize  the  real  times  clocks  in 
both  processors,  to  be  sent  to  processor  1. 

3)  ATRPA  - This  task  node  is  activated  periodically  by  a 

clock  signal.  Each  tinie  it  is  activated,  it  checks 
a flag  which  is  set  if  a message  has  been  received 
from  the  other  processor.  If  no  message  has  been 
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received,  tl\e  other  processor  is  assvinied  to 
have  failed  and  Its  failure  link  (RPFAIL)  is 
enabled.  Otherwise  the  flaii  is  simply  reset. 

Special  Processor  1 Nod»*s 

n Kl'lN  - lliis  sink  node  is  activatefi  when  the  proc«‘.s.sor 
1 exei-ntive  completes  initialization.  It  sends  a 
messaj^e  to  processor  0 which  causes  link  RPFIN 
(remote  processor  finisli  linkl  to  l)e  enabled  in 
processor  0. 

2)  A'l'RV'A  - This  task  node  has  the  same  function  as  the 
node  with  the  same  i^ame  in  processor  0. 

The  numbers  which  appear  on  the  data  links  which  are  input  to 
nodes  Dll  and  012  in  the  processor  0 DFG  also  have  a special  meaning. 
They  indicate  the  number  of  tin\es  each  link  nuist  be  enabled  before  the 
terminating  node  is  activated.  In  effect  these  links  function  as  speed 
changes  (discarders).  This  implementation  of  speed  changers  is  more 
efficient  for  discarder  ratios  which  are  integral  than  the  diacarder  node 
itself. 


1.2.  3.1  Key  to  DFG  Models 


Application  Task 


ATxxx  is  index  of  corres- 
ponding node. 


Data  Link 
Control  Link 

Asynchronous  Access  Ijink 

Data  and  Control 
link  to  and  froni 
remote  processor 


I nm^ 


Nxx  or  Axx  is  the  index  of 
the  DTF'  notification  or 
DAC  access  controller 
associated  with  the  link 
(if  anyl. 


Software  signaled  links 
(link  signaled  by  a task 
via  executive  interface 
procedures:  SICiNI.FVFNT, 
F.NRLC.ATF,  DSBl.GATE. 
or  SIGAC). 


■y  > 


•Ill 


Pin  (holds  event  time  for 
the  most  vecent  enabling). 

Clock  Pin  (frequency  is 
given  in  parenthesis). 


name 

□ 


Pi(Nii(n/Mc) 

□ 


GTkxx 

0 


Task  with  serialized  control 
(i.  e.  , task  runs  once  for 
each  enabling  of  a,  b,  and 
c even  if  they  occur  simul- 
taneously). 


(be 


Simple  Identity  (control) 


Data  Identity 


Inverter  (changes  enable 
to  disable). 


name  is  index  of  pin 
Pxxx  is  index  of  pin 

GT'xxx  is  index  of  gate 


Six  is  index  of  node 


DIx  is  index  of  node 


IVx  is  index  of  node 


Data  Selector  Storage  Node 
(static  allocation) 


Data  Selector  Storage  Node 
(dynamic  allocation) 


SSCxxx 

CXSCxxx 


SSCxxx  is  index  of  corres- 
ponding static  storage 
controller. 


DSCxxx  is  index  of  corres- 
ponding dynamic  storage 
controller. 
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Sovjrcp  (for  tasks).  Data  is 
read  from  the  source  when 
the  task  is  ready  to  be 
scheduled.  When  notifica- 
tion is  received  that  the 
data  has  arrived,  the  task 
is  scheduled. 


ADxx  is  the  access-controller 
on  the  task  node's  begin  access 
list  which  causes  the  source  to 
be  read.  Dxx  is  the  name(s)  of 
the  device(8)  read,  NDxx  is  the 
notification  code  provided  by  DTF 
when  the  data  has  been  read. 


Source  (stand  alone).  Data 
is  read  from  the  device  when 
the  control  link  becomes 
enabled,  notification  is  pro- 
vided when  the  transfer  is 
complete. 


RDxx  is  the  index  of  the  source 
node  (RL  node),  Dxx  is  the 
name(s)  of  the  device(s)  read, 
NDxx  is  the  notification  code 
provided  by  DTF  when  the  data 
has  been  read. 


Sink  (for  tasks).  Data  is 
written  to  the  sink  when  the 
task  completes.  If  a notifi- 
cation link  is  shown,  then 
the  task  may  not  be  scheduled 
for  execution  again  until  out- 
put is  complete. 


ADxx  is  the  index  of  the  access 
controller  on  the  task  node's 
end  access  list  which  causes 
the  data  to  be  written,  Dxx  is 
the  device  name(s),  NDxx 
(optional)  is  the  notification 
code  provided  by  DTF  when 
the  data  has  been  written. 


Icontrol) 


Sink  (stand  alone).  Data 
(accessed  via  the  asyn- 
chronous link)  is  written 
to  the  device(s)  when  the 
control  link  beconies  en- 
abled, notification  is  provided 
when  the  transfer  is  complete. 


(data) 


RDxx; 


NDxx 


Node  which  becomes  active 
whenever  a,  b,  or  c is 
enabled,  a,  b,  and  c never 
occur  simviltaneously. 


RDxx  is  the  index  of  the  sink 
node  (RL  node),  Dxx  is  the 
device  name(s),  NDxx  is  the 
notification  code  provided  by 
DTF  when  the  transfer  is 
comolete. 


Link  which  enables  a gate. 

(EG) 

<>0 

Link  which  disables  a gate. 

)DG) 

-od 

Link  which  can  enable  or 
disable  a gate. 

(EG/DGI 

■**E1 

Link  to  remote  processor 
which  carries  data  selector 
data. 


Asynchronous  access  link 
to  data  selector  node  in  a 
remote  processor.  When 
task  node  becomes  active, 
a message  requesting  the 
data  is  sent  to  the  remote 
processor.  When  data  is 
received  from  the  remote 
processor,  the  task  is 
scheduled  for  execution. 


NOSSxx 


**  ADSSxx 


XX  is  name(s)  of  data  selector 
node(s)  whose  data  is  sent  to 
the  remote  processor.  ADSSxx 
is  the  access  controller  in  the 
task's  end  access  list  which 
causes  the  transfer,  and  NDSSxx 
is  the  notification  code  (if  any) 
provided  by  DTF  when  the  trans- 
fer is  complete. 


ADSSxx  is  the  index  of  the 
access  controller  in  the  task 
node's  begin  access  list  which 
causes  a message  to  be  sent 
to  the  remote  processor  request- 
ing the  data  in  the  data  selector(s) 
XX.  When  the  data  is  received 
DTF  notifies  the  task  node  via 
the  notification  code  NDSSxx. 


Link  which  carries  data 
from  a remote  data  selec- 
tor storage  node  asynchro- 
nously for  asynchronous 
access. 


NDSSxx  is  the  notification  code 
provided  by  DTF  whenever 
data  for  the  data  selector(s)  xx 
is  received.  DSCxx  is  the  index 
of  the  dynamic  storage  controller 
for  the  data  selector (s).  ADSSxx 
is  the  access  controller  used  by  the 
task  node  to  gain  access  to  the 
latest  dynamic  copy  ol  the  data. 


Data  selector  data  sent  in 
response  to  asynchronous 
request  from  a remote 
processor. 


NDSSxx 


Axx  R« 


Whe»i  data  request  is  received, 
DTF  provides  the  notification 
code  Nl>SSxx  which  causes  the 
sink  node  Rzr,  to  become  active. 
The  requested  data  is  copied 
from  the  data  selector  storage 
node  XX  via  the  access  controller 
with  index  .AxxRzz  and  sent  to 
the  remote  processor  which 
requested  it. 


1.  2.  3.  2 WrocesBor  0 DKG  Modpl 
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1 . 4*.  4 Corr 

I'spomti'Mt'i’ 

of  UKC. 

Moilf'l 

lo  l-'oriual 

m-c; 

ni'X'i  Mo«lc'l 
Task  Nodi’ 

ProcoMsor 

Kornial  OKCi  Taak  and  Control  Soloctor 
Noili's  Inclndi’d 

ATOl 

0 

TOl  , 

T02. 

TO 3.  TOO. 

TOS,  TOO.  T07.  '] 

T87. 

TOO. 

TIOO,  Cl . 

C2.  Cl.  C4.  CS. 

C7.  ( 

C14,  ( 

:'U) 

A TO*) 

0 

TOS. 

TOO, 

T07 

A TO  8 

0 

TO  8. 

C8 

A TOO 

0 

TO  4. 

T07, 

TOO.  TOO, 

CIO 

ATI  2 

0 

TI2. 

T02. 

T88 

ATI  8 

0 

TIO, 

Tt7. 

T18,  TIO. 

T»4,  'MS.  Tf7,  " 

r44. 

T4S. 

lO’O.  Tn4. 

TlOf.  Cl  2 

AT20 

0 

T1 1 . 

•n  f. 

TI4.  TIS. 

TIO,  T20 

AT2I 

0 

T21  , 

T2  f, 

T24.  T2S, 

T20.  T28 

AT22 

0 

T22. 

C.l  1 

ATt2 

0 

Tf2. 

T f f 

A T W. 

0 

■r2o. 

T Ul, 

rU.  IMO. 

T40.  rOl.  T42.  •] 

C27 

AT  <8 

0 

Tf8. 

T57 

AT4». 

0 

T2(.. 

rto. 

CM.  CIS 

A r47 

0 

IV-n. 

T47. 

('!(-.  (17 

AT4  8 

0 

■r2(). 

T48. 

C18 

AT40 

0 

T2(.. 

T4'i, 

CM 

AT  SO 

0 

TSO. 

TSl  . 

('20.  C21. 

C22 

ATS  2 

0 

T.’.n. 

TSZ. 

C2( 

ATS  \ 

0 

T2(.. 

TS  f. 

C24 

ATS4 


(1 


rZ(.,  TH4. 


a 


H 


DFG  Model 
Task  Node 


AT59 

AT61 

AT63 


AT65 


AT67 


AT68 


AT71 

AT72 


AT79 


AT80 


AT81 


AT85 


AT86 


Procea  sor 


Formal  DFG  Task  .ind  Control  Selector 
Node  a Included 


Tb6 


T70.  T71.  T73 

Tb9,  T72,  T93.  T94,  T95,  T96.  T97.  T98 
C29 

T74.  T75.  T76.  T77,  T78.  T79 


T8I  . T82.  T83.  T84 


AT89 

AT90 

AT91 

AT92 


ATlOl 


T08,  TlOl 


ATI  02 


T08.  T102 


ATI  04 


T58.  T104 


Section  II 
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SYSTEM  PARAMETERS 

The  aystetn  paranieters  are  those  mission  parameters  which 
are  pertinent  to  executive  performance.  The  valm-s  of  tliese  paran\eters 
,ir«'  derived  by  »'xan\lnatii>n  of  the  constructed  DECI  model  (discussed  in 
Section  1 . ii)  tojiether  w'itli  tin'  supplementary  informa tion  provided  with 
the  formal  n\ission  PEG. 

Some  data  provided  with  the  formal  DF'G.  though  necessary  in 
the  process  of  construction  of  the  OFG  model,  is  not  directly  pertinent 
to  executive  performance;  for  example,  the  main  memory  space  required 
by  each  task.  The  information  which  is  pertinent  are  the  rates  at  which 
clock  firinjjs  and  the  asynchronous  events  which  drive  the  PEG  occur. 

Ry  following  the  flow  of  control  from  clock  and  ext«*rnal  pins  on  the  PEG 
model  the  rate  at  which  each  node  is  activated  can  be  estimated.  In  some 
cases  assumptions  about  the  probable  action  of  control  selectors,  jiates, 
and  tasks  which  can  si>>nal  must  be  made.  In  general,  the  worst  case 
conditions  (i.  e.  , those  which  maximir.e  activity)  are  assumed.  In  par- 
ticular, all  gates  are  assumed  open  unless  sonie  nixitually  exclusive  re- 
lationship is  specified  in  which  case  the  most  severe  of  the  mutually  ex- 
clusive gating  conditions  is  assumt'd. 

Once  this  detaiU'd  Information  abi)ut  node  activity  has  been  produced, 
the  rate  of  each  executivt'  activity  can  be  coniputed.  Each  executive  activity 
can  be  timed  by  straightforward  examination  of  the  machine  instructions 
executed.  Ry  cotttbinlng  these  parameters,  the  total  worst  case  executive 
execution  time  overhead  may  be  computed.  In  addition,  bus  loading  can 
be  computetl  based  on  the  activity  of  nodes  which  are  attached  to  I/O 
devices  (and  remote  links)  together  with  the  supplementary  information 
to  the  forn>al  PF^G  whicli  provl<les  the  nun\ber  of  words  transferred  to  or 
from  each  device. 

Coniputation  of  executive  space  overhead  is  more  easily  performed. 
Program  spact'  overhead  is  simply  tlie  total  of  tlie  space  required  by  each 
executive  procedure.  Pata  space  is  the  total  of  each  data  structure  size 
(e.  g.  , ta)>le  entry  size)  times  file  numbt'r  of  instances  of  that  data  structvire 
requir«*d  to  implen^ent  tlie  tuned  PF'G  model. 


12 


! 


i 


i 

1 

j 

i 


( 


i 


2.1 


Z,  1 Processor  0 System  Parameters 
CLOCKS; 


Name 

Rate 

(firings /second) 

# Pins 
attached 

P2 

32 

2 

RPA 

20 

1 

PI  9 

19 

1 

P3 

16 

2 

PI  2 

12 

1 

PI  3 

10 

4 

PI 

8 

2 

P4 

4 

3 

P6 

2 

1 

P7 

1.25 

1 

P9 

1.11 

1 

P8 

1.00 

2 

PIO 

0.476 

2 

CLK 

0, 150 

1 

PI  5 

0.016 

1 

1 5 clocks 

127  firings/ sec 

25  clock 

223  pins/sec 


37  simultaneous  firings/ sec 
90  clock  interrupts/sec 


Tasks 


Periodic  9 

Conditionally  14 
periodic 

Aperiodic  ^ 

29  157 

Control  selector  1 8 


Discarder 


32/5 

2 

64 

32/10 

1 

16 

5/4 

1 

5 

Data  identity 

5 

29 

Remote  link 

7 

60 

Gate 

1 

0 

Simple  identity 

1 

8 

Inverter 

Totals 

LINKS: 

Type 


_2  0 

50  nodes  347 

activations  / sec 


Number  Rate 

Po^»-d_  (per  second) 


Output  links 


DATA  ACCESS  CONTROL; 


Activity 

Process  begin  access  list 

Process  end  access  list 

Process  access  controller 

I/O  request  226 

Static  copy  44 

Dynamic  block  allocation  32 


Word  copied  by  MOV 


Rate 

(per  second) 
246 
246 


302 

461 


I/O: 

Activity 

Master  to  remote  switch 
Remote  to  master  switch 

Master  I/O  complete  notification(s)  queued 
Remote  I/O  complete  notification(s)  queued 
I/O  complete  notification 
Data  transmitted  (command  word) 

Data  word  transmitted 
I/O  request 

Static  data  195 

Dynamic  data  32 


Rate 

(per  second) 
100 
100 
90 
5 

216 

507 

3000 


227 
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2.2 


2.  2 Processor  1 System  Parameters 
CLOCKS: 


NODES: 


Name 

Rate 

(firings/second) 

# Pins 
attached 

P2 

32 

2 

RPA 

20 

1 

Pll 

20 

2 

P3 

16 

1 

PI  4 

15 

^ 

5 clocks 

103  firings/sec 

7 clock  pin: 

1 55  pins/sec 

36  simultaneous  firings/ 

sec 

67  clock  interrupts/sec 

Activations 

Ixpe 

Quantity 

(per  second) 

Tasks 

Periodic 

3 

Conditionally 

periodic 

6 

Aperiodic 

5 

14 

224 

Data  identity 

1 

1 

Remote  link 

3 

2 

Totals 

18  nodes 

227 

activations/sec 


36 


I 


UrNKS; 


Type 


Number 

posted 


Output  links 


0 

1 


Consume  links 


0 


Kate 

(per  second) 


163 

64 


227 

227 


Total 


454 


DATA  ACCESS  CONTROL; 
Activity 


Rate 

(per  second) 


Process  begin  access  list 

227 

Process  end  access  list 

227 

Process  access  controller 

I/O  request 

108 

Static  copy 

40 

Begin  read  dynamic 

352 

End  read  dynamic 

352 

852 

Word  copied  by  MOV 

560 
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Section  III 


TUNING  THE  EXECUTIVE 

The  OSC  executive  which  results  from  the  construction  proces 
described  in  Section  1 is  called  the  baseline  executive  for  the  nUssion 
It  is  possible  that  this  baseline  executive  uses  so  much  processor  tim 
and  space  that  the  mission's  resource  requirements  cannot  be  met. 
Hence,  it  is  important  that  the  performance  of  tlte  baseline  executive 
be  calculated  and  that  the  executive  be  tuned  if  the  performance  is  in- 
adequate. 

This  section  describes  in  detail  the  calculation  of  baseline  ex- 
ecutive performance  for  the  DAIS  mission,  the  process  of  tuning  the 
executive,  and  the  performance  of  the  tuned  executive. 


Processor 

Time  /Second 
(milliseconds) 

Space 

(words) 

Processor  0 

J731  Programs 

296.  096 

7017 

Assembly  Language 
Programs 

74.489 

1301 

Tables 

3635 

Total 

370.  585 

12045 

Overhead 

37.  P/o 

36.8% 

Processor  1 

J73I  Programs 

332. 476 

7017 

Assembly  Language 
Programs 

77. 882 

1301 

Tables 

2720 

Total 

410. 358 

11130 

C verhead 

41.  0% 

34% 

This  section  presents  the  time  and  space  statistics  for  the  baseline 
executives.  Summaries  are  presented  in  Figures  2 and  3.  It  should  be 
noted  that  the  baseline  execvitive  is  generalized;  that  is,  it  is  not  dependent 
on  or  tailored  to  the  application  DFG.  The  space  statistics  presented  for 
each  cluster  consider  all  programs.  Separate  figures  are  presented  for 
the  executives  ci>ntaining  luily  the  necessary  programs. 

rho  executives  have  not  been  structured  to  minimize  overhead 
related  to  compiler  deficiencies  since  this  will  be  handled  in  the  tuning 
process.  The  inability  to  enable  and  disable  the  processing  of  interrupts 
as  inline  functions  and  the  lack  of  double  precision  fixed  point  items  requir- 
ing assembler  procedures  for  subtracting,  adding  and  comparing  these 
values  contribvited  over  3.  f)”;?  time  overhead  to  each  processor. 
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Cluster 

Processor  0 

Processor  1 

ITC 

91.  436 

85.  591 

TIM 

83.  on 

63.  532 

DAC 

30. 123 

58. 165 

SCH 

34.  635 

49.  415 

PTF 

85. 775 

88. 803 

MSM 

6.  829 

18. 983 

DSP 

38. 776 

45. 869 

Total 

370. 585 

410. 358 

Overhead 

37.  1% 

41.  0% 

Fig.  2 Baseline  Executive  Timing  Statistics 


Cluster 

Total  Executive 

Executive 

with  programs  used  by  DFG 

Processor  0 

Processor  1 

Processor  0 

Processor  1 

ITC 

5093 

4151 

4411 

3129 

TIM 

888 

808 

810 

730 

DAC 

1387 

1312 

1099 

958 

SCH 

332 

332 

332 

332 

DTF 

3484 

3020 

3484 

3020 

MSM 

426 

1072 

426 

1072 

DSP 

435 

435 

435 

435 

Total 

12045 

11130 

10997 

9676 

Overhead 

36.  8% 

34.  0% 

33.6% 

29.  5% 

Fig,  3 Baseline  Executive  Space  Statistics 
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Processor 

Time  /Second 
(milliseconds ) 

Space 
(words ) 

Processor  0 

.1731  F'rograms 

8S.  738 

3302 

Assembly  l.anguage 
Programs 

2.  bb8 

16 

Tables 

1685 

lotal 

0 1.  436 

5003 

Overhead 

0.  1% 

15.  5% 

Processor  1 

.T73I  Progranis 

83.  806 

3302 

Assembly  l.angviage 
Progranis 

1.  725 

lti 

Tables 

743 

Total 

8“^.  501 

4151 

Overhead 

8.  6''o 

12.  7-’o 

3.  1.  1.  1 Approach 

The  Intertask  Cojiiiniinication  cluster  is  responsible  for  overall 
control  of  DFG  interpretation.  The  primary  data  strncturos  which  control 
the  interpretation  are  the  node  table  (NDTDLl,  the  pin  table  (PINTBL), 
and  the  link  table  (LNKTBL).  The  tasks  performed  by  ITC  include: 

• Processing  of  events  signaled  through  pins. 

• Posting  of  enabled  and  disabled  links. 

• Processing  of  active  nodes. 

• Initiation  of  I/O  activities  (via  PAt'i. 

• Processing  of  I/O  coniplete  noti ficatit'us. 

3.  1.  1.  2 Pefinition  of  ITC  Activities 

Timing  statistics  for  ITC  were  derived  from  the  tinie  required  to 
perforni  each  ITC  activity  together  with  the  rate  at  which  each  activity 


4’ 


must  be  performed.  The  rates  for  each  activity  were  obtained  directly 
from  the  tuned  DFG.  Since  the  worst  case  assumption  that  all  non-mutually 
exclusive  gates  were  open  was  made,  the  timing  statistics  correspond  to 
peak  load  conditions. 

Pins  are  signaled  primarily  through  clock  firings  which  occur  at 
the  rate  of  127  and  103  per  second  in  processors  0 and  1,  respectively. 
Some  clocks  control  more  than  one  pin  resulting  in  90  and  50  additional 
pin  firings  per  second.  Associated  with  most  of  these  firings  are  task 
starts  which  occur  at  the  rate  of  157  and  224  (worst  case)  in  processors 
0 and  1,  respectively.  In  addition,  other  nodes  are  activated  by  pin  firings 
and  task  completions  resulting  in  a total  of  347  and  227  active  nodes  proc- 
essed per  second  (again,  worst  case). 

Notifications  of  I/O  completion  occur  at  the  rate  of  211  and  105 
active  notifies,  and  5 and  61  passive  notifies  per  second  in  processors 
0 and  1,  respectively. 
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J.  1,  1.  3 ITC  nanolino  I'irtaiLrd  riming  Statiwticw 
3.  1.  1,  3.  1 ProcoH Mor  0 


1 ’roccduro 

Proiu'HHinn  riiuo 
(nii<"  rofieconda ) 

Occurrences 

/Seci?nd 

'I'otal 

(niilllHeconds) 

’r<in  rai\>  s 

1 rCAC'l' 

Kach  entry 

IS.  0 

i>:naiu,I‘: 

‘)0 

1.  074 

I'iacli  notify 

3 3.  0 

2 1 1 

')(>3 

8.  037 

t'lO  KNAPPK 

ITCl’AS 

Kach  entry 

IS.  (> 

KNAIU.K 

5 

0.  O')  3 

laich  notify 

37.  -4 

5 

0.  187 

0.  280 

4 S 1‘:NA1\1,K 

NO  J- IKY 

Pin  notify 

■4  0.  -4 

S7 

2.  045 

Node  notify 

4'1,  2 

ISO 

7.  823 

10. 403 

KNPTK 

S.»,  S 

l-'NP  rsK 

KAO  css 

4 1 3 

7.  022 

4 133  KN  rSK 

4 133  PUOl' 

4 133  KAOOSS 

KNiriKS 

107.  2 

IvNP'lSK 

PROP 

KAOOSS 

riMK 

2 4 

2.  S73 

4 24  KNPTSK 

4 24  PROP 

4 24  KAOOSS 

4 24  I'lMO' 

AfllVK 

13.  4 

447 

11.  SOO 

NOSN 

()‘>.  -4 

H 

0.  sss 

NIK' 

I’/*!  ratio 

Nt'  onlpnt 

S 1 . 

S4 

2.  7l>S 

t)iif  put 

S'). 

10 

0.  S') 2 

3*1/ 10  ratio 

N<'  output 

S 1 . .1 

t 1 

0.  Sp3 

(.lot  pnl 

S'). 

0.  2')(i 

S/I  rafi»> 

Ni'  output 

s I.  2 

1 

n,  05 1 

t'ut  put 

S').  2 

•4 

0.  237 

(,  I HAi'i  SS 

t i:Acrss 


4.  ‘■'O-l 
1.  7JH 

( i'>  HAc'rss 

I .’.'j  I'  AC'C.S 


NPl 


Procfsainj;  Time  Ocrv>r r enccB  Total 

(microaecomls ) /Second  (n\{llinrcond8 ) 


NRL 

63.  8 

^ RACCSS 

60 

3.  828 

4 EACCSS 

♦ 60  13ACCSS 

» SENP 

t 60  EACCSS 
+ 60  SENP 

NvSI 

34.  6 

8 

0.  277 

NTK 

40.  4 

t RACCSS 

157 

6,  343 

♦ SC HEP 

+ 157  BACCSS 
♦ 157  SC  HEP 

SIGNL 

Each  entry 

F’ach  additional 

75.  2 

127 

P.  550 

pin 

48.  2 

PI 

4.  386 

13.  P36 

EPINS 

74.  0 

24 

1.  776 

PLINKS 

Flach  entry 

26.  8 

427 

11.  444 

1 link  posted 

24.  4 

5P 

1.  440 

3 links  posted 

73.2 

32 

2.  342 

15. 226 

isenibler  Programs 

CNPV’RC 

7.  6 

347 

2,  637 

EVLPRD 

7.6 

8 

0.  061 

Total  91.  436 

< PS  fnaulf: 

^ 1S7  F'NPTSK 
i 1S7  DROP 
^ 246  EACCSS 
4 246  RACCSS 
t 24  TIMF' 

^ 60  SEND 
t 157  SrUF'.P 


3.  1.  1.  3.  2 Processor  1 


Procedure 

Processing  Time 
(microseconds) 

Occurrences 

/Second 

Total 

(milliseconds 

Programs 

ITCACT 

Each  entry 
Each  notify 

18.  6 + ENABLE 
33.  0 

60 

105 

1.  116 

3.  465 

4.  581 

4 60  ENABLE 


ITCPAS 

Each  entry 

Each  notify 

18.  6 + ENABLE 

37.  4 

50 

61 

0.  930 

2.  281 

3,  211 

+ 50  ENABLE 

NOTIFY 

Static  pin 

Static  node 
Dynamic  nil 
Dynamic  node 

46.  4 

49.  2 

80.  4 + RETCOR 

102.  2 + RETCOR 

2 

105 

61 

1 

0.  093 

5.  166 

4,  904 

0.  102 

la  265 

+ 62  RETCOR 

ENDTK 

52.  8 f ENDTSK 
+ DROP 
+ EACCSS 

121 

6.  389 

4 121  ENDTSK 
+ 121  DROP 
+ 121  EACCSS 

ENDTKS 

107.  2 + ENDTSK 
f DROP 
+ EACCSS 
+ TIME 

103 

11.  042 

4 103  ENDTKS 
+ 103  DROP 

4 103  EACCSS 
t 103  TIME 

ACTIVE 

33.  4 

227 

9.  252 

NDI 

59.  6 + BACCSS 
f EACCSS 

1 

0.  060 

1 BACCSS 

4 EACCSS 

NRL 

43.  8 t BACCSS 
EACCSS 

2 

0.  088 

4 2 BACCSS 

4 2 EACCSS 

NTK 

40.  4 + BACCSS 
f SCHED 

224 

9.  050 

4 224  BACCSS 

4 224  SCHED 

Procedure 

Processing  Time 

Occurrences 

Total 

(microseconds) 

/Second 

(milliseconds ) 

SIGNL 

Each  entry 

75.  2 

155 

11.  656 

Each  additiona  1 

pin 

48.  2 

50 

2.  410 

14, 066 

EPIKS 

74.  0 

52 

3.  848 

PLINKS 

Each  entry 

26.  8 

390 

10.  452 

1 link  posted 

24.  4 

64 

1.  562 

12.  014 

Assembler  Programs 

CNDPRC 

7.6 

227 

Total 

Overhead 

1.  725 

85.  591 

+ 110  ENABLE 
+ 62  RETCOR 
+ 224  ENDTSK 
+ 224  DROP 
+ 103  TIME 
+ 227  EACCSS 
+ 227  BACCSS 

8.  6% 
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3.  1.  1.4  ITC  Detailed  Space  Statistics 


i 


3.  1.  1.  4.  1 Processor  0 


J73I  Programs 

ITCINT 

100 

EGATES 

28 

EGATE 

34 

DGATES 

28 

DGATE 

32 

ITCACT 

48 

ITCPAS 

48 

NOTIFY 

176 

ELGATE 

128 

ILNK 

106 

ICNDW 

20 

DLGATE 

126 

RLNK 

128 

DCNDW 

22 

ENDTK 

48 

ENDTKS 

98 

ACTIVE 

30 

NCSN 

86 

NDC 

86 

NDI 

70 

NIV 

52 

NGT 

52 

NRL 

76 

NSI 

28 

NTK 

70 

SIGNL 

80 

QSIGNAL 

48 

DQSIGNAL 

66 

EPINS 

78 

PLINKS 

58 

DLINKS 

36 

ENBLGATE 

34 

DSBLGATE 

34 

SGNLEVENT 

34 

SIGAC 

20 

Program  Data  Space 

502 

J73I  Programs 

Not  Used  by  DEG 

ENDl.G 

24 

DNDLG 

28 

NCI 

50 

NCN 

48 

NCS 

1 10 

NDP 

00 

NDSS 

96 

Words 


2710 


w 


48 


NGS 

50 

DPI  NS 

84 

SIGNLDEVENT 

34 

Assembler  Programs 

CNDPRC/EVLPRD 


b8Z 

3392 

16 


Data 

(see  below) 


1685 


TOTAL 

Overhead 


5093 
15.  5% 


r Data  Structure 

Size 

(words) 

Occurrences 

Total 

Nodes  (ND) 

TK  nodes 

26 

20 

520 

TKS  nodes 

30 

11 

330 

nodes 

12 

5 

60 

RL  nodes 

12 

7 

84 

SI  nodes 

8 

1 

8 

IV  nodes 

10 

2 

20 

GTN  nodes 

10 

3 

30 

DC  nodes 

14 

4 

56 

CSN  nodes 

10 

1 

10 

Padding 

20 

1 

20 

1138 

Pins  (PIN) 

4 

43 

172 

Links  Vector  (LNK) 

1 

31 

31 

Gates 

4 

17 

68 

Link  Gates  (LGT) 

8 

18 

144 

Notification 

Controllers  (NTF) 

4 

33 

132 

Misc.  Global  Variables  1 4 4 


Total 


1685 


3.  1.  1.  4.  2 Processor  1 


Words 


J73I  Programs 

ITCINT 

100 

EGATES 

28 

EGATE 

34 

DGATES 

28 

DGATE 

32 

ITCACT 

48 

ITCPAS 

48 

NOTIFY 

176 

ELGATE 

128 

ILNK 

106 

ICNDW 

20 

DLGATE 

126 

RLNK 

128 

DCNDW 

22 

ENDTK 

48 

ENDTKS 

98 

ACTIVE 

30 

NDI 

70 

NRL 

76 

NTK 

70 

SIGNL 

80 

QSIGNAL 

48 

DQSIGNAL 

66 

EPINS 

78 

PLINKS 

58 

ENBLGATE 

34 

DSBLGATE 

34 

SIGNLEVENT 

34 

SIGAC 

20 

Program  Data  Space 

502 

2370 


J73I  Progrnnis 

Not  Used  by  DFG 

ENDLG 

24 

DNDLG 

28 

NCI 

50 

NCN 

48 

NCS 

no 

NCSN 

86 

NDC 

86 

NDP 

00 

NDSS 

06 

NFP 

■y 

NIV 

52 

50 

NJN 

NGS 

NGT 

NSI 

DPINS 

DLINKS 

SIGNLDEVENT 


Assembler  Programs 
CNDPRC/EVLPRD 
Data 

(see  below) 

Total 

Overhead 


ITC  Data  Structure 

Node  Table 
TK  nodes 
TKS  nodes 
DI nodes 
RL  nodes 
Padding 

Pins 

Links  Vector 

Gates 

Link  Gates 

Notification 

Controllers 

Misc,  Global  Variables 


66 

50 

52 

28 

84 

36 

34 

1022 

3392 


16  16 


743 


4151 

12.  7% 


Size 

Occurrences 

Total 

26 

10 

260 

30 

6 

180 

12 

1 

12 

12 

3 

36 

18 

1 

18 

50^ 

4 

15 

60 

1 

5 

5 

4 

7 

28 

8 

7 

56 

4 

21 

84 

1 

4 

4 

743 

51 


3.  1.  1.  5 ITC  Sensitivity  Analysis 


The  most  important  factors  affecting  ITC  overhead  are  the  number 
of  nodes  and  the  rates  at  which  they  become  active.  7'he  rate  at  which  ITC 
processes  active  nodes  is  347  per  second  in  processor  0 and  227  in  proc- 
essor 1.  Handling  of  these  active  nodes  and  posting  of  their  attached  links 
accounts  for  55%  of  the  time  spent  in  ITC.  In  particular,  active  node  dis- 
patching alone  (procedure  ACTIVE)  accounts  for  approximately  12%  of 
ITC  overhead. 

Handling  of  DTI'  I/O  complete  notifications  at  the  rates  of  216  in 
processor  0 and  167  in  processor  1 accounts  for  approximately  20%  of  ITC 
time. 

Signaling  of  pins  proceeds  at  the  rates  of  2ZS  and  205  in  processors 
0 and  1 accounting  for  approximately  15%  of  ITC  overhead. 


3.  1.  2 TIM  Baseline  Statistics 


Processor 

Time/Second 
(milliseconds ) 

Space 

(words) 

Processor  0 

J73I  Programs 

45. 539 

589 

Assembly  Language 
Programs 

37. 472 

159 

Tables 

140 

Total 

83. 011 

888 

Overhead 

8.  3% 

2.  7% 

Processor  1 

J73I  Programs 

28,413 

589 

Assembly  Language 
Programs 

35. 119 

159 

Tables 

60 

Total 

63. 532 

808 

Overhead 

6.  1% 

2.  5% 

3.  1.  2.  1 Approach 

The  Timing  cluster  has  several  primary  tasks: 

• maintain  system  time 

• notify  tasks  when  clock  interrupts  occur 

These  tasks  are  performed  by  maintaining  a list  of  clocks  in  each  processor. 
Processor  0 has  15  clocks;  processor  1 has  5.  Clock  A interrupts  are 
fielded  by  INTCKA,  which  invokes  CIvOCKQ  to  signal  the  appropriate  tasks. 

It  also  maintains  the  clock  queue  by  inserting  the  clock  with  the  new  firing 
time  onto  the  queue  in  the  .ippropriate  place.  The  procedvire  SETCLOCK  is 
called  to  set  timer  A to  interrupt  at  the  firing  time  for  the  first  clock  on  the 
list.  If  the  time  as  already  passed,  SETCLOCK  will  return  a value  indicating 
this. 

Clock  B is  used  to  synchroni/.e  time  in  each  processor.  This  is  done 
by  INTCKB,  which  is  calle<l  once  every  6.  5537  seconds. 
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; Definition  of  TIM  Activities 

The  processor  may  be  in  the  executive  or  an  application  task 
when  an  interrupt  occurs.  There  are  127  clock  timings  per  second 
on  the  processor  0 DFG,  and  103  for  processor  1,  Of  these,  at  most  90 
on  processor  0 and  67  on  processor  1 are  actual  interrupts  causing  INTCKA 
and  CLOCKQ  to  be  entered.  The  remainder  are  simultaneous  firings  for 
which  only  one  interrupt  occurs. 

The  scheduler  must  also  manipalate  times  in  order  to  schedule 
application  tasks.  In  addition,  TIME  is  called  by  a number  of  clusters. 
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3^  1,  2.  3 TIM  Baseline  Detailed  Timing  Statistics 
3.  1.2.  3.1  ProcesBor  0 


55 


Procedure 

Processing  Time 
(microseconds ) 

Occurrences 

/Second 

Total 

(milliseconds) 

J73I  Programs 

CLOCKQ 

Times  entered 

375.  7 + ENABLE 

90 

33.814 

Simultaneous 

firing 

316.9 

37 

11.725 

45.539 

+ 90  ENABLE 

Assembler  Programs 

SETCLOCK 

Timer  set 
(worst  case) 

21.  8 

90 

1.962 

TSUM 

From  CLOCKQ 
From  SCHED 

20.  4 

20.4 

127 

157 

2.591 

3.  203 

5.790 

TGTR 

TRUE 

from  CLOCKQ 

23.  8 

655 

15.597 

FALSE 

from  CLOCKQ 

21.  8 

127 

2.769 

TRUE 

from  SCHED 

23.  8 

236 

5.  617 

FALSE 
from  SCHED 

21.  8 

235 

5.  123 

29. 106 

TIME 

12.  8 

48 

0.614 

Total 

83.  011 

+ 90  ENABLE 

Overhead 

8.  3% 

I 

i . _* 
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3.  1.  2.  3.  2 Processor  1 


Procedure 

Processing  Time 
(microseconds ) 

Occurrences 

/Second 

Total 

(milliseconds) 

J73I  Programs 

CLOCKQ 

Times  entered 
Simultaneous 

296.  4 + ENABLE 

67 

19.  859 

firing 

237.  6 

36 

8.  554 

28.  413 

+ 67  ENABLE 

Assembler  Programs 

SETCLOCK 

Timer  set 

(worst  case) 

21.  8 

67 

1.  461 

TSUM 

From  CLOCKQ 

20.  4 

103 

2.  101 

From  SCHED 

TGTR 

20.  4 

224 

4.  570 

TRUE 

from  CLOCKQ 

23.  8 

285 

6.  783 

FALSE 

from  CLOCKQ 
TRUE 

21.  8 

103 

2.  245 

from  SCHED 
FALSE 

23.  8 

336 

7.997 

from  SCHED 

21.  8 

336 

7.325 

24.  350 

TIME 

12.  a 

206 

2.  637 

Total 

63. 532 

+ 67  ENABLE 

Overhead 

6.  4% 

3.  1.  2.  4 TIM  Baseline  Detailed  Space  Statistics 
3.  1.  2.  4.  1 Processor  0 


Words 

J73I  Programs 

TIMINT 

20 

STARTCLOCK 

112 

CLOCKQ 

154 

SETALARM 

102 

CLRALARM - 

(not  used) 

78 

Program  Data  Space 

123 

589 

Assembler  Programs 

Interrupt  handlers 

40 

SETCLOCK 

24 

TIME 

16 

TSUM/TDIF/TGTR 

64 

Program  Data  Space 

15 

159 

Data 

Clock  Table 

136 

Miscellaneous  items 

4 

140 

Total 

888 

Overhead 

2.  7% 

3.  1.  2.  4.  2 Processor  1 

J73I  Programs 

Same  as  Processor  0 

589 

Assembler  Programs 

Same  as  Processor  0 

159 

Data 

Clock  Table 

56 

Miscellaneous  items 

4 

60 

Total 

808 

Overhead 

2.  5% 

57 


i 

1 


3.  1.  2.  5 Sensitivity  Analyais 

The  prime  determinant  of  overhead  is  the  structure  of  the  clock 
queue.  F;ach  attempt  to  insert  a clock  on  the  list  requires  57.  0 micro- 
seconds for  each  clock  on  the  list  with  an  earlier  firinn  time.  Processors 
0 and  1 average  5.  16  and  2.  77  clocks,  respectively,  for  each  insert.  This 
ts  3.  7%  overhead  on  processor  0 to  insert  clocks  127  tinies  each  second, 
and  1.6'fo  overhead  for  processor  1. 

Two  other  factors  affecting  overhead  arc  boating  clocks  and  missed 
firing  times.  When  two  or  more  clocks  fire  at  the  same  time,  only  one  clock 
interrupt  is  taken.  When  a firing  time  is  missed,  the  interrupt  is  processed 
even  thovigh  the  physical  interrupt  did  not  occur.  Both  of  these  actions  take 
less  time  than  actual  interrupts. 


3.  1.  3 DAC  Baseline  Statistics 


Processor 

Time /Second 
(milliseconds) 

Space 

(words) 

Processor  0 

J73I  Programs 

28.  479 

1044 

Assembly  Language 
Programs 

1.644 

16 

Tables 

327 

Total 

30. 123 

1387 

Overhead 

3.  0% 

4.2% 

Processor  1 

J73I  Programs 

56. 389 

1044 

Assembly  Language 
Programs 

1.  776 

16 

Tables 

252 

Total 

58.  165 

1312 

Overhead 

5.8% 

4.  0% 

3.  1.3.  1 


jroach 


The  Data  Access  Control  cluster  manages  access  to  global  data 
(data  accessed  by  more  than  one  task),  and  initiation  of  I/O  activities 
(via  DTF).  DAC  interfaces  with  ITC  via  the  interface  procedures  BACCSS 
and  EACCSS.  These  procedures  process  lists  of  access  controllers  which 
contain  information  about  the  type  of  access  required.  According  to  the 
access  type,  control  is  dispatched  within  DAC  to  process  each  requested 
access  operation. 


rrspt't'tively.  Kai-li  call  invi>lvrs  the  processinn  of  an  acccBs  list  result- 
ing in  a number  of  accesses  of  each  possible  type  occurring  each  second. 
The  rates  at  which  each  access  type  occurs  were  collected  from  the  rate 
of  each  node  activation  together  with  the  accesses  required  by  that  node 
as  specified  on  the  tuned  DFG. 


3.  1.  3.  3 DAC  Baseline  Detailed  Timing  Statistics 
3.  1,  3.  3.  1 Processor  0 


Procedure 

J73I  Programs 
BACCSS 


Processing  Time  Occurrences 
(microseconds)  /Second 


Total 

(milliseconds) 


Each  entry 

20.  4 

246 

5.  018 

I 

Each  AC 

processed 

24.  8 

130 

3.  224 

1 

1 

8.  242 

EACCSS 

Each  entry 

20.  4 

246 

5.  018 

il 

Each  AC 

i 

processed 

24.  8 

172 

4.  226 

1 

9.  284 

BWD 

42.  8 + GETCOR 

32 

1.  370 

+ 32  GETCOR 

1 - 

ESWS 

48.  6 

34 

1.  652 

1 

BSRS 

68.  4 

10 

0.  684 

CSTN 

38.  0 + SEND 

8 

0.  304 
+ 8 SEND 

CST 

33.  4 + SEND 

25 

0.  835 
+ 25  SEND 

USTN 

32.  0 + SEND 

151 

4.  832 

+ 151  SEND 

UST 

25.  8 + SEND 

10 

0.  258 
+ 10  SEND 

UDT 

31.  8 + SEND 

32 

1.  018 
+ 32  SEND 

1 

Assembler  Programs 

1 

CPYD 

I 

Each  entry 

16.  4 

44 

0.  722 

1 

Words  copied 

2.  0 

461 

0.  922 

i 

1.  644 

1 

) 

Total 

30.  123 

+ 32  GETCOR 
+ 226  SEND 

Overhead 

3.  0% 

61 


3.  1.  3.  3.  2 Processor  1 


Processor 

Processing  Time 

Occurrences 

Total 

(microseconds ) 

/Second 

(milliseconds) 

J73I  Programs 

BACCSS 

Each  entry 

20.  4 

227 

4.631 

Each  AC 
processed 

24.  8 

407 

10. 094 

14.  725 

EACCSS 

Each  entry 

Each  AC 

20.  4 

227 

4.631 

processed 

24.  8 

445 

11.  036 

15.  667 


352  11.  194 

352  9,434 

40  1.944 

103  3.  296 

+ 103  SEND 

UST  25.  8+ SEND  5 0.129 

+ 5 SEND 

Assembler  Programs 
CPYD 

Each  entry  16.  4 40  0.  656 

Words  copied  2.  0 560  1.  120 

.1.J.76 

Total  58.  165 

+ 108  SEND 

Overhead  5. 


BARD 

ERD 

ESWS 

USTN 


31.  8 
26.  8 
48.  6 

32.  0 + SEND 


(.2 


4 DAC  Baseline  Detailed  Sp 

ace 

Statistics 

4.  1 Processor  0 

Words 

J73I  Programs 

DACINT 

150 

BACCSS 

70 

EACCSS 

100 

BARS 

36 

BWD 

42 

ESWS 

46 

CSTN 

34 

CST 

30 

USTN 

28 

UST 

22 

UDT 

34 

Program  Data  Space 

158 

756 

J73I  Programs 

Not  Used  by  DFG 

BARD 

34 

BSRD 

22 

BSRS 

20 

EAWD 

54 

EAWS 

38 

ERD 

40 

ESWD 

80 

288 

Assembler  Programs 

CPYD 

16 

16 

Data 

(see  below) 

327 

Total 

1387 

Overhead 

4.  2% 

Occurrences 

62 


DAC  Data  Structure 

Access  Controllers  (A) 


Size  (words) 
4 


Dynamic  Storage 

Controllers  (DSC) 

Static  Storage  Controllers  (SSC) 

Extra  Static  Blocks  (SSB) 
for  extra  copies  of 
certain  data  blocks 


4 

4 


1 

5 


3 12 

1 17 

1 26 


Total 

248 

4 

20 


4 

17 

26 


3.  1.  3.  4.  2 Processor  1 


J73I  Programs 

DACINT 

150 

BACCSS 

76 

EACCSS 

100 

BARD 

34 

BARS 

36 

ERD 

40 

ESWS 

46 

USTN 

28 

UST 

22 

Program  Data  Space 

158 

J73I  Programs 

Not  Used  by  DFG 

BSRD 

22 

BSRS 

20 

BWD 

42 

EAWD 

54 

EAWS 

38 

ESWD 

80 

CSTN 

34 

CST 

30 

UDT 

34 

Assembler  Programs 

CPYD 

Data 

(see  below) 

Total 

Overhead 

Z Data  Structure 

Siz( 

Access  Controllers  (AC) 

Dynamic  Storage 
Controllers  (DSC) 

Static  Storage 

Controllers  (SSC) 

Extra  Static  Blocks  (SSB) 
for  extra  copies  of 
certain  data  blocks 


Words 


252 
1312 
4.  0% 

Occurrences 

36 


Total 

144 


64 


The  important  factors  contributing  to  DAC  overhead  are  the  number 
of  access  lists  processed  per  second  and  the  number  of  access  controllers 
in  these  access  lists.  Management  of  the  processing  of  these  lists  (not 
including  the  processing  of  each  individual  access  controller)  accounts  for 
approximately  55%  of  DAC  overhead  while  individual  access  controller 
processing  accounts  for  the  remaining  45%. 


In  processor  0,  247o  of  DAC  time  is  spent  processing  I/O  request 
access  controllers  while  in  processor  1 only  6%)  is  spent  here.  In  both 
processors  the  rates  at  which  static  data  is  copied  is  fairly  low:  substan- 
tial increases  in  these  rates  would  be  required  if  more  frequent  task  pre- 
emption were  allowed.  For  example,  if  400  static  blocks  averaging  10 
words  had  to  be  copied  per  second,  DAC  overhead  in  processor  0 would 
nearly  double.  In  processor  1 a substantial  amount  of  read  access  to 
dynamic  data  is  handled  (352  accesses  per  second)  accounting  for  35% 
of  DAC  overhead. 


! 


3 


3.  1.  4.  1 Approach 

The  Scheduling  cluster  orders  the  execution  of  application  tasks. 
It  interfaces  with  ITC  and  DAC  via  procedure  SCHED  which  is  called  to 
schedule  a task  for  execution.  SCH  maintains  a queue  structure  which 
holds  all  scheduled  tasks  not  yet  executed  and  all  tasks  which  have  been 
partially  executed  and  then  preempted.  Tasks  in  the  scheduling  queue 
are  ordered  by  deadline  and  by  preemption  rules. 

When  a task  completes,  DROP  is  called  by  ITC  to  remove  the 
task  from  the  scheduling  queue.  The  dispatcher  (DSP)  examines  the 
top  of  the  scheduler  queue  and  either  restarts  the  active  task  or  starts 
a new  task.  If  a new  task  is  started,  AUTASK  is  called  to  manipulate 
the  queue  so  that  the  task  is  on  the  active  task  stack. 


^ Ji 


3.  1.  4.  2 Definition  of  Activities 


The  important  parameter  affecting  SCH  overhead  is  the  number 
of  task  starts  per  second  (157  in  processor  0,  and  224  in  processor  1). 
Each  task  execution  implies  one  SCHED  call,  one  AUTASK  call,  and 
one  DROP  call.  Calls  to  SPRTIM  (two  per  task  start)  are  not  included 
in  the  baseline  timing  statistics  but  are  discussed  separately  in  Section 
3.  1.  4.  5. 

At  each  SCHED  call  the  queue  is  searched  until  the  first  pre- 
emptable  task  is  found  (usually  the  idle  task).  The  new  task  is  inserted 
into  the  queue  of  tasks  scheduled  to  preempt  that  task  according  to  its 
deadline. 


( 


3.  1.  4.  3 SCH  Baseline  Detailed  Timing  Statistics 
3.  1.  4.  3.  1 Processor  0 


Procedure 


J73I  Programs 

SCHED 
Each  entry 
Each  task  not 
preemptable 
Each  task 
more  urgent 


AUTASK 

DROP 


Processing  Time  Occurrences 

Total 

(microseconds ) 

/Second 

(milliseconds ) 

106.  8 + TSUM 

157 

16. 768 

+ TGTR 

28.  4 

157 

4.  459 

24.  2 + TGTR 

157x2 

7.  599 

28.  826 
+ 157  TSUM 
+ 471  TGTR 

12.  4 

157 

1.947 

24.  6 

157 

3.  862 

Total 

34. 635 
+ 157  TSUM 
+ 471  TGTR 

Overhead 

3.  5% 

3.  1.  4.  3.  2 Processor  1 


Procedure 


ProcessintgTime  Occurrences  Total 

(microseconds!  /Second  (milliseconds! 


J73I  Programs 


SCHED 


Each  entry 

106.  8 + TSUM 

224 

23. 923 

Each  task 

+ TGTR 

not  preemptable 

28.  4 

224 

6.  362 

Each  task 

more  urgent 

24.  2 f TGTR 

224x2 

10. 842 

41.  127 
+ 224  TSUM 
+ 672  TGTR 

AUTASK  12.4  224  2.  778 

DROP  24.  6 224  5.  510 

Total  49.415 

+ 224  TSUM 
+ 672  TGTR 

Overhead  4. 9% 

68 


1 


I 


I 

I 

t 

I 

t 


m ^ — 

1 

t 

f- 

i 

I 3.  1.  4.  4 SCH  Baseline  Detailed  Space  Statistics 

; 3.  1.4.  4.1  Proceasor  0 

. 

I Words 

i J73I  Programs 


SCHINT  12 

SCHED  120 

AUTASK  12 

DROP  28 

SPRTIME  88 

Program  Data  Space  72 

332 

Data 

None  

Total  332 


Overhead  1. 0% 


! J 


3.  1.  4.  4.  2 Processor  1 
J731  Code 

Same  as  Processor  0 

Data 

None 

Total 

Overhead 


Words 


332 


332 

1.  0% 


6d 


J 


i 


I 


Scheduling  is  the  most  frequently  occur rinj;  activity  (157  per  second 
in  processor  0,  224  per  second  in  processor  1)  which  requires  a search. 
However,  the  size  of  the  queue  at  any  given  time  is  not  particularly  large 
since  it  only  includes  tasks  ready  to  run  which  have  not  been  run.  Hence 
nuich  of  the  scheduler  overhead  (approximately  50%)  is  due  to  loop  setups 
in  SCHED  and  actual  insertion  of  tasks  into  the  queue.  The  total  time 
required  by  SCH  is  proportional  to  the  number  of  task  starts/sec. 

Part  of  the  work  performed  in  scheduling  is  the  numerous  calls 
to  TSUM  and  TGTR  to  perform  double  precision  functions.  The  overhead 
for  these  calls  was  included  in  the  TIM  cluster  baseline  timing  statistics. 
Had  it  been  included  with  SCH,  the  SCH  overhead  would  be  approximately 
40''o  higher  than  it  was  in  l)oth  processors. 

Overhead  involved  in  computing  spare  time  for  all  tasks  in  the 
schedule  queue  was  not  included  because  it  is  essentially  a testing  tool. 

The  overhead  was,  however,  computed  separately  and  the  timing  statis- 
tics are  presented  below.  Overhead  for  calls  to  TIME,  TSUM,  TGTR 
and  TDIF  are  included. 

Spare  Time  Computation  Timing  Statistics 

Processor  0 

SPRTIME 

Each  entry  53.  0 157  x 2 16.  642 

Each  task  in  queue  122.  4 157  x 2 x 3 115.  300 

Total  131.943 

Overhead  13.2% 

Processor  1 

SPRTIME 

Each  entry  53.  0 224  x 2 

Each  task  in  queue  122.  4 224  x 2 x 3 

Total 

Ov'erhead 


70 


23.  744 
164.  506 

188.  250 

18.  8% 


3.  1.  5 DTF  Baseline  Statistics 


Processor 

Time /Second 
(milliseconds ) 

Space 

(words) 

Processor  0 

J73I  Programs 

57.515 

1192 

Assembly  Language 
Programs 

18. 260 

946 

Tables 

1346 

Total 

85.775 

3484 

Overhead 

8.6% 

10.  6% 

Processor  1 

J73I  Programs 

70.543 

1198 

Assembly  Language 
Programs 

18.  260 

946 

Tables 

882 

Total 

88. 803 

3020 

Overhead 

8.  9% 

■■  ■ , - 

9.  2% 

3.  1.  5.  1 Approach 

Processor  0 is  specified  as  the  head  processor.  This  requires  it 
to  perform  certain  tasks  not  done  by  processor  1.  These  include: 

• Synchronizes  system  time  in  the  other  processor. 

• Coordinates  system  initialization  by  taking  master 
control  of  the  bus,  sending  an  initialization  signal, 
and  passing  bus  control  to  the  nonhead  processor. 

Tasks  that  are  performed  by  both  processors  include: 

• Error  recovery  and  retry  associated  with  bus  commands. 

• Passing  control  of  the  bus  from  processor  to  processor 
according  to  a deadline  priority  scheme. 

• Dynamic  construction  of  conimand  word  lists  from  a 
fixed  memory  list  as  events  occur. 


i 

I 

Two  of  these  tasks  are  the  prime  contributors  to  time  overhead: 

• Passing  Inis  control  uses  4,8%  of  processor  0,  7,0% 
of  processor  1, 

• Construction  of  command  word  lists  uses  4.  0%  of 
processor  0,  1. 9%  of  processor  1. 


3.  1.  5.  2 Definition  of  DTF  Activities 

The  procedure  BCI2  is  responsible  for  controlling  the  bus.  There 
is  a tradeoff  between  the  overhead  associated  with  the  number  of  times  it 
passes  bus  control  and  the  ability  to  meet  response  requirements  ifthe  bus  is 
held  for  too  long.  The  timing  statistics  are  based  on  an  average  time  for 
holding  the  bus  of  five  milliseconds,  thereby  taking  control  and  relinquishing 
it  100  times  each  second.  The  average  time  holding  the  bus  is  controlled  by 
two  factors:  the  length  of  the  command  word  list  and  the  amount  of  bus  fill 
time  introduced.  Bus  fill  time  is  the  time  the  processor  with  bus  master 
control  uses  the  bus  to  receive  dummy  data  from  the  other  processor.  The 
bus  fill  time  can  be  set  within  the  BCI2  program. 


The  system  response  requirements  are  such  that  an  average  bus 
hold  time  in  excess  of  ten  milliseconds  would  be  adequate,  thus  cutting 
the  control  passing  overhead  by  more  than  one  half.  However,  it  is  felt 
that  a five  millisecond  average  would  be  more  representative  for  most 
systenis. 


• » 
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3.  1.  5.  3 DTF  Baseline  Detailed  Timing  Statistic 8 
3.  1.5.3.  1 Processor  0 


Procedure 
J7  Progranis 


Processing  Time  Occurrences 


(microseconds) 


/ second 


Total 

(milliseconds) 


DTFACT 

ITCACT  to  be  queued 

45.  8 + QUEUE 

90 

4.  122 

No  ITCACT  notification 
requirement 

38.  8 

10 

, 388 

Dynamic  storage  data 
transmitted 

21.  6 + RETCOR 

32 

. 691 

5.  201 

DTFPAS 


+ 32  RETCOR  calls 
+ 90  QUEUE  calls 


Each  entry 

158.  0 

100 

15.  800 

ITCPAS  to  be  queued 

1.  4 + QUEUE 

5 

. 007 

Each  block  of  data  rec'd 

27.  4 

5 

. 137 

15. 944 
+ 5 QUEUE 

DTFKEY 

63.4 

100 

6.  340 

SENDYN 

66.  6 

32 

2.  130 

SEND 

Allocated  entry 

76.  2 

195 

14.  860 

Allocated  additional 
command  word /entry 

55.  8 

262 

14.  620 

Allocated  notify 

15.  8 

195 

3.  080 

Unallocated  entry 

104.  6 

40 

4.  180 

Unallocated  additional 
command  word /entry 

84.  2 

10 

. 840 

Unallocated  notify 

15.  8 

20 

. 320 

iibler  Programs 

BCI2 

37. 900 

Take  control  of  bus 

114.  6 

100 

11.  460 

Relinquish  control  of  bvis 

68.  0 

100 

6.  800 

18. 260 
85. 775~ 
8.  t-,% 


7 3 


Total 

Overhead 


3.  1.  5.  3.  2 Processor  1 


Procedvire 

J7  3I  Programs 

DTFACT 

Processing  Time 
(microseconds) 

Occurrences 

/second 

Total 

(milliseconds) 

ITCACT  to  be 
queued 

45.  8 + QUEUE 

60 

3.  984 

No  ITCACT 
notilication  req. 

38.  8 

40 

1.  552 

5.  536 

+ 60  QUEUE  calls 

DTFPAS 

Each  entry  364.  0 100  36.  400 

ITCPAS  to  be 

queued  1.4+  QUEUE  50  . 070 

Each  block  of 
statically  allocated 

data  received  27.  4 5 . 137 

Each  block  of 
dynamically  allocated 

data  received  47.  4 4 GETCOR  60  2.  844 

39.451 

+ 60  GETCOR  calls 
+ 50  QUEUE  calls 


DTFKEY 

63.  4 

100 

6.  340 

SEND 

Each  entry 

76.  2 

no 

8.  382 

Each  additional 

command 

word/entry 

55.  8 

165 

9.  207 

Each  notify 

15.  8 

100 

1.  627 

19.  216 


Assembler  Prourams 

nci2 


Take  control  of 


bus 

114.  6 

100 

11.  460 

Relinquish 
control  of  bus 

68.  0 

100 

6.  800 

18. 260 

Total 

Overhead 

88. 803 
8.  4% 

74 


Words 


3.  1.  5.  4 DTF  Baseline  Detailed  Space  Statistics 
3.  1.  5.  4.  1 Processor  0 
J73I  Programs 


RDYBUS  24 

DTFINT  190 

DOWN  52 

FLIP  168 

DTFPAS  208 

DTFACT  102 

DTFKEY  62 

SENDYN  62 

SEND  208 

Program  Data  Space  116 


Assembler  Programs 

BCIU  interrupt  handlers  to  transfer  control, 
synchronize  time,  support  monitoring  of 
other  processors,  do  all  bus  message  retries 


and  determine  failures 

BCIU  initialization  88 

BCIU  interrupt  and  error  handling  846 

setting  new  priority  12 

Data 

DEVTBL  - Device  list  table  36 

CWTBL  - Command  word  table  636 

CWQCWP  - Queued  CW  indexes  41 

MNOTFY  - Master /ITC  notification  buffer  110 
RNOTFY  - Remote /ITC  notification  buffer  24 
RCODE  - Remote /ITC  notification  codes  5 
FLPDEV  - Failed  device  CW  queue  40 

Miscellaneous  Items  J73I  21 

Subaddress  Pointer  Words  118 

Storage  for  Executive  'Signals'  6 

Scratch  Storage  32 

Priorities  [left  as  4 Processor  Case]  5 

Miscellaneous  Items  Assembler  32 

Command  Word  Storage  240 


Total 

Overhead 


1192 


946 


1346 

3484 

10.6% 


75 


3.  1.  5.  4.  2 Processor  1 


J73I  Programs 


Same  as  Processor  0 

1192 

Assembler  Programs 

Same  as  Processor  0 

946 

Data 

DEVTBL  - Device  list  table 

40 

CWTBL  - Command  word  table 

246 

CWQCWP  - Queued  CW  indexes  (not  used) 

1 

MNOTFY  - Master/ITC  notification  buffer 

30 

RNOTFY  - Remote /ITC  notification  buffer 

70 

RCODE  - Remote/ITC  notification  codes 

15 

FLPDEV  - Failed  Device  CW  Queue 

30 

Miscellaneous  Items  J73I 

21 

Subaddress  Pointer  Words 

no 

Storage  for  Executive  'Signals' 

12 

Scratch  Storage 

32 

Priorities  [left  as  4 Processor  case] 

5 

Miscellaneous  Items  Assembler 

30 

Command  Word  Storage 

240 

s 

882 

Total 

3020 

Overhead 

9.2% 

3.  1.  5.  5 Sensitivity  Analysis 


There  are  two  primary  factors  affecting  the  overhead  of  DTF.  The 
first  is  the  tradeoff  between  the  overhead  associated  with  each  bus  control 
transfer  and  meeting  response  requirements.  The  amount  of  overhead 
increases  linearly  with  each  transfer  of  control;  however,  if  the  number 
of  transfers  is  reduced,  the  average  length  of  time  the  bus  is  controlled 
between  transfer  of  control  increases  and  it  becomes  more  difficult  to  meet 
response  requirements. 

The  second  factor  is  the  amount  of  data  that  must  be  transferred  be- 
tween processors.  The  effect  on  the  overhead  can  be  seen  by  examining  the 
executives.  Each  time  control  is  received, DTFPAS  is  entered.  In  processor  0. 
where  up  to  five  blocks  of  data  can  be  received,  each  entry  takes  158.  0 micro- 
seconds to  process,  contributing  a total  of  1.  6%  to  overhead.  However,  processor  1 
can  receive  up  to  fifteen  blocks  of  data.  This  requires  364.  0 microseconds  to  process 
for  a total  of  3.  6%  overhead.  This  is  an  increase  of  2%,  directly  attributable  to  the 
additional  ten  blocks  of  data  that  can  be  transferred. 
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3.  1.  6 MSM  Baseline  Statistics 


Processor 

Time/second 

(milliseconds) 

Space 

(words) 

Processor  0 

J73l  Programs 

6.  829 

258 

Assembly  Language 
Programs 

Tables 

68 

Total 

6.  829 

426 

Overhead 

.7% 

1.2% 

Processor  1 

J73I  Programs 

18.  989 

358 

Assembly  Language 
Programs 

Tables 

714 

Total 

18,  989 

- 

1072 

Overhead 

1.9% 

3.2% 

3.  1.  6.  1 Approach 

Elach  executive  maintains  a list  of  free  blocks.  E^ch  allocated  block 
will  be  34  words  long  and  use  space  from  the  first  block  on  the  list.  When 
a block  of  storage  is  returned,  it  is  placed  in  the  list  in  ascending  order  by 
address.  If  a block  is  contiguous  at  either  end,  it  will  be  compacted  with 
the  contiguous  block  to  keep  one  larger  block  on  the  list. 

3.  1.  6.  2 Definition  of  MSM  Activities 

When  storage  is  allocated  by  a call  to  GETCOR,  it  comes  from  the 
first  block  of  storage  that  is  large  enough.  The  block  may  be  the  exact  size 
required,  or  it  may  be  larger  leaving  a smaller  block  of  the  remaining  space 
on  the  list. 


when  a block  of  storage  is  returned,  it  may  be  contiguous  to  other 
blocks  on  the  free  list.  In  addition,  it  is  placed  in  the  appropriate  place 
on  the  list. 

In  processor  0,  there  are  at  most  two  GETCOR  calls  in  a row 
followed  by  two  RETCOR  calls.  In  processor  1,  there  are  fourteen 
GETCOR  calls.  This  is  followed  by  at  most  seven  GETCOR  calls  before 
any  storage  is  returned. 

3.  1.  6.  3 MSM  Baseline  Detailed  Timing  Statistics 
3.  1.  6.  3.  1 Processor  0 


Procedure 
J73I  Programs 

GETCOR 


Processing  Time 
f micro  seconds) 


RETCOR 

First  on  list,  contiguous 
at  end 

First  on  list,  not 

contiguous  at  end 

Second  on  list,  contiguous 
both  sides 


101.  0 

109.  4 
105.0 

137.  8 


Occurrences  Total 
/second  (milliseconds) 

32  3.232 


24  2. 626 

4 . 420 


Total 

Overhead 


3.  597 
6.  829 
. 7% 


3.  1.  6.  3.  2 Processor  1 

Procedure 
J73I  Programs 

GETCOR 


Processing  Time  Occurrences  Total 

(microseconds)  /second  (milliseconds) 


Exact  fit 

151.  6 

31 

4.  700 

Extra  space 

160.  8 

31 

4.  985 

lETCOR 

Not  contiguous 

137,  8 

15 

2.  067 

Contiguous  both  ends 

154.  2 

15 

2.  313 

Contiguous  front 

149.  8 

16 

2.  397 

Contiguous  end 

142.  2 

16 

2.  275 

9.  685 


Total 

Overhead 


9.  298 

18. 983 
1.9</o 


3.  1,  6.  4 MSM  Baseline  Detailed  Space  Statistics 

3.  i.  6.  4.  1 Processor  0 

Words 

J73I  Programs 

GETCOR 

142 

RETCOR 

166 

MSMINT 

18 

Program  Data  Space 

32 

358 

Data 

Two  blocks 

68 

Total 

426 

Overhead 

1.2% 

3.  1.6.4.  2 Processor  1 

Words 

J73I  Programs 

GETCOR 

142 

RETCOR 

166 

MSMINT 

18 

Program  Data  Space 

32 

358 

Data 

Twenty-one  blocks 

714 

Total 

1072 

Overhead 

3.2% 

3.  1.  6.  5 Sensitivity  Analysis 

The  primary  factor  affecting  both  space  and  time  is  the  sequencing 

of  calls  to  GETCOR  and  RETCOR. 

If  the  calls  are 

interspersed  and  random. 

then  a free  list  is  created.  As  the  free  list  gets  longer,  the  time  to  process 
and  the  number  of  blocks  of  storage  required  increases  linearly. 
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The  Dispatching  cluster  controls  the  assignment  of  the  processor 
to  executive  and  application  tasks.  Tasks  to  be  run  in  executive  mode  are 
queued  (by  interrupt  handlers)  through  calls  (some  inline)  to  QUEUE.  When- 
ever an  executive  task  completes  it  returns  through  the  DSP  program 
DQUEUE  which  dispatches  the  next  executive  task  in  that  queue  until  it 
is  empty. 


When  all  executive  tasks  are  complete,  the  dispatcher  either 
restarts  the  application  task  which  was  interrupted  or  starts  a new  more 
urgent  application  task. 


3.  1.  7.  2 Definition  of  DSP  Activities 

The  dispatcher  tends  to  be  the  most  active  cluster  in  the  executive. 
Entries  via  DQUEUE  occur  at  each  interrupt  (200  bus  and  90  clock  in 
processor  0 and  200  bus,  67  clock  in  processor  1)  and  at  each  task  comple 
tion  (157  in  processor  0 and  224  in  processor  1).  Hence,  assuming  worst 
case,  tasks  are  restarted  290  and  267  times  a second,  and  started  157  and 
224  times  a second  in  processor  0 and  1 respectively.  Restarting  inter- 
rupted or  preempted  tasks  involves  manipulation  of  the  stack  and  restoring 
of  all  registers  and  the  machine  state.  Starting  of  new  tasks  requires  a 
call  to  the  schedule  function  AUTASK  and  allocation  of  a stack  frame. 

DSP  also  contains  assembly  programs  ENABLE  and  DISABL  which 
are  called  by  other  clusters. 
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3.  1.7.  3 DSP  Baseline  Detailed  TiminR  Statistics 


3.  1.  7.  3.  1 Processor  0 


Procedure 

Processing  Time 
(microseconds) 

Occurrences 

/Second 

Total 

(milliseconds ) 

J73I  Programs 

QUEUE 

20.  6 

95 

1.  957 

DQUEUE 

Each  entry 

Task  ends 

DTE  interrupt  returns 

TIM  interrupt  returns 

16.  2 

157 

200 

90 

447 

7.  241 

Each  item  queued 

DTE  queueing 

Interrupts  in  executive 

24.  8 

95 

87  (30%  of  290) 

182  4.  514 

11.  755 

DSPTSK 

Interrupt  returns 
Task  starts 

26.  0 

19.  8 + AUTASK 

290 

157 

7.  540 

3.  109 

10.  649 

+ 157  AUTASK 


Assembler  Programs 

CEXCPR 

6.  0 

182 

1.  092 

RSTART 

24.  6 

290 

7.  134 

START 

8.  8 

157 

1.  382 

ENDTSK 

7.  8 

157 

1.  225 

ENABLE 

4.  4 

185 

0.  814 

DISABL 

4.  4 

629 

2.  768 

Total  38.776 

Overhead  3. 9% 
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3.  1.  7.  3.  2 Processor  1 


Procedure 

Processing  Time  Occurrences 

(microseconds)  /Second 

Total 

(milliseconds) 

J73I  Programs 

QUEUE 

20.  6 

no 

2.266 

DQUEUE 

Each  entry 

Task  ends 

DTE  interrupt  returns 

TIM  interrupt  returns 

16.  2 

224 

200 

67 

4^ 

7.954 

Each  item  queued 

DTE  queueing 

Interrupts  in  executive 

24.  8 

no 

80  (30%  of  267) 

190  4.712 

12.666 

DSPTSK 

Interrupt  returns 
Task  starts 

26.  0 

19.  8 + AUTASK 

267 

224 

5.  500 

4,  435 

+ 224  AUTASK 

Assembler  Programs 

CEXCPR 

6.  0 

190 

6.  840 

RSTART 

24.  6 

267 

6.  568 

START 

8.  8 

224 

1.971 

ENDTSK 

7.  8 

224 

1.  747 

ENABLE 

4.  4 

200 

0.  880 

DISABL 

4.  4 

681 

2.996 

Total 

45. 869 

Overhead 

4.6% 

3.  1.  7.  4 DSP  Baseline  Detailed  Space  Statistics 


3.  1.  7.  4.  1 Processor  0 


J73I  Programs 


DSPINT  78 

QUEUE  16 

DQUEUE  36 

DSPTSK  26 

ENDTSK  10 

Program  Data  Space 


Assembler  Programs 

ENABLE/DISABL  8 

CEXCPR  6 

START/RSTART  18 

Machine  initialisation/ 

Fault  handling  132 


Data 

EXCQ  - executive 
task  queue  6 

Initialization  handler 
transfer  addresses  32 
Machine  initialiaation/ 
fault  handling  3 1 


Total 

Overhead 


Words 


202 


164 


^ 

435 

1.3% 


3.  1,  7.  4.  2 Processor  1 


Words 


J73I  Programs 


Same  as  Processor  0 

202 

Assembler  Programs 

Same  as  Processor  0 

164 

Data 

Same  as  Processor  0 

69 

Total 

435 

Overhead 

1.3% 
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3,  1.7.5  DSP  Sensitivity  Analysis 

Approximately  40%  of  the  time  spent  in  the  dispatcher  is  spent 
in  queueing  and  dequeueing  executive  tasks.  This  operation  is  triggered 
at  each  task  completion  and  each  interrupt.  These  occur  at  the  com- 
bined rate  of  447  per  second  in  processor  0 and  491  per  second  in  proc- 
essor 1 (worst  case).  Approximately  30%  is  consumed  in  restarting 
interrupted  application  tasks,  and  20%  in  starting  and  finishing  tasks. 
The  remaining  10%  is  consumed  in  performing  the  interrupt  enable  and 
disable  functions  for  various  J73I  programs  in  the  executive. 


3.  1.  8 SSM  Baseline  Statistics 

Secondary  Storage  Management  is  not  implemented. 

3.  1.  9 MPL  Baseline  Statistics 

Multiprocessor  Locking  is  not  implemented. 

3.  1.  10  Bus  Traffic 

There  are  two  events  which  control  the  amount  of  activity  on  the 
bus.  The  first  is  passing  control  of  the  bus.  This  requires  sending  two 
data  words,  one  command  word  and  one  status  ■w'ord.  In  addition,  there 
is  a delay  of  57.  2 microseconds  each  time  control  is  passed.  Control 
is  passed  100  times  each  second. 

The  second  event  causing  bus  traffic  is  the  transmission  of  data. 
Each  transmission  causes  a command  word,  status  word  and  the  data 
word  to  be  sent.  There  is  a transmission  for  each  command  word  allo- 
cated on  calls  to  SEND.  The  number  of  data  words  sent  is  derived  from 
the  data  associated  with  each  command  word. 

Each  data  word  requires  20  microseconds  to  send. 

The  statistics  for  each  processor  are  summarized  in  the  following 
sections.  The  transmissions  include  both  device  and  interprocessor 
data. 
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3.  1.  10.  1 Processor  0 


Occurrences  Words  Transmitted 

/Second 

/Second 

Passing  control 

100 

400 

Data  transmissions 

507 

1014 

Data  words 

3000 

4414  words 

2.0  microseconds/word 
88.  28  milliseconds 

Idle  time 

5.  72  milliseconds 

Total 

94.  00 

Overhead 

9.  4% 

, 2 Processor  1 

Occurrences  Words  Transmitted 

/Second 

/Second 

Passing  control 

100 

400 

Data  transmission 

275 

550 

Data  words 

6500 

7450  words 

2.  0 microseconds/word 

149.  0 milliseconds 

Idle  Time 

5^iT2  milliseconds 

Total 

154.  72 

Overhead 

15.  5% 

3.  1.  10.  3 Interprocessor  Bus  Traffic 
3.  1.  10.  3.  1 Processor  0 to  Processor  1 


Data  transmissions 
Data  words 

Negligible  asynchronous 
transmissions 


Occurrences 

/Second 


Words  Transmitted 
/Second 


60  120 
561 


681 
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3.  1,  10.  3.  2 Processor  1 to  Processor  0 

Negligible  - all  asynchronous 
transmissions 


Total  of  all 
interproces- 
sor traffic 


681  words 

20  microseconds/word 
13.  62  milliseconds 


Overhead  1. 4% 


3.  2 Tuning  for  HOL  Inefficiencies 
3.  2.  1 Overview 

This  section  provides  examples  of  efficiency  gain  by  recoding 
J73I  executive  procedures  into  DAIS  assembly  language.  The  approach 
taken  is  to  recode  procedures  such  that  they  could  be  inserted  into  the 
executive  as  a replacement  for  the  J73I  procedures.  Further,  recoding 
is  done  with  the  same  limitations  that  apply  to  the  compilers;  primarily, 
procedure  linkage  conventions  must  be  adhered  to,  procedure  calls  must 
be  made,  and  state  variables  may  be  modified  across  procedure  calls. 

The  intention  of  this  section  is  to  obtain  an  understanding  of  the 
extra  OSC  executive  overhead  associated  with  inefficiencies  in  code 
generated  by  the  compiler. 

3.  2.  2 Examples 

Examples  have  been  selected  from  three  executive  function 
clusters:  CLOCKQ  from  the  TIM  cluster,  SEND  from  the  DTE  cluster 
and  QUEUE  and  DQUEUE  from  the  DSP  cluster.  These  examples  were 
selected  as  representive  of  all  features  of  the  code  that  comprises  the 
executive.  Timing  statistics  are  based  on  the  system  performance 
parameters  of  processor  0. 

3.  2.  2.  1 CLOCKQ 

The  hand  coded  version  of  CLOCKQ  is  shown  in  Appendix  A, 
Figure  A.  1.  Overhead  comparisons  between  the  baseline  and  handcoded 
versions  are  as  follows: 
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Time/Second 
(milliseconds ) 

Space 

(words) 

Baseline  Versioi\ 

41.  53‘) 

182 

Handcoded  Version 

20. 085 

102 

Percentage  Reduction  over 
Baseline  Version 

5 5.9% 

44.  0% 

3.2.  Z.  2 SEND 

The  handcodrd  version  of  SEND  is  shown  in  Appendix  A,  FiBure  A.  2. 

Overhead  comparisons  between  the  baseline  and  handcoded  versions 
are  as  follows: 


Time /Second 
(milliseconds) 

Space 

(words) 

Baseline  Version 

37.  900 

240 

Handcoded  Version 

21.  123 

149 

Percentage  Reduction  over 
Baseline  Version 

44.  3% 

37.  9% 

3.  2.  2.  3 QUEUE  and  DQUEUE 

I'he  handcoded  versions  of  QUEUE  and  DQllEUE  are  shown  in 
Appendix  A,  FiBure  A.  3. 

C>verhead  comparisons  between  the  baseline  and  iiandcoded 
versions  are  as  follows: 


4’in\e  /Second 

Space 

(milliseconds ) 

(words ) 

Baseline  version 

13. 712 

61 

Fiandcoded  version 

8.  504 

39 

I’ercentage  reduction  over 
Baseline  Version 

38.  0% 

36. 1";. 
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3.  2.  3 Summary 

The  effect  on  the  OSC  executive  of  eliminating  compiler  code  in- 
efficiency can  be  estimated  as  a 48,  2%  reduction  in  time  and  a 40.  0% 
reduction  in  space.  This  estimate  is  derived  from  the  composite  effi- 
ciency gain  shown  in  the  previous  three  examples. 

3.  3 Tuning  by  Reducing  Generality 
3.  3.  1 Overview 

This  section  provides  examples  of  efficiency  gain  resulting  from 
removing  executive  generality  not  required  for  a specific  DFG. 

3.  3.  2 Examples 

The  examples  used  and  the  efficiency  comparisons  made  are  based 
on  the  results  of  Section  3.  2,  Tuning  for  HOI,  Ineffiencies. 

3.3.  2.1  CLOCKQ 

The  excess  generality  of  CLOCKQ  with  respect  to  the  processor  0 
DFG  requirements  exists  only  in  the  ability  to  handle  alarm  clocks.  The 
recoded  version  of  CLOCKQ  without  alarm  clock  capability  is  shown  in 
Appendix  A,  Figure  A.  4.  Comparisons  between  the  handcoded  version 
of  Section  3.  2 and  the  recoded  version  of  this  section  are  as  follows: 


Time/Second 

Space 

(milliseconds) 

(words) 

Handcoded  version 

20. 085 

102 

No  excess  generality  version 

19.374 

98 

Percentage  Reduction  over 
handcoded  version 

3.  5% 

3.  9% 

3.  3.  2.  2 SEND 

The  processor  0 DFG  does  not  require  SEND  to  support  unallocated 
master  receive  and  remote  receive  subaddresses.  The  recoded  version  of 
SEND  with  the  above  capability  removed  is  shown  in  Appendix  A,  Figure  A.  5. 
Comparisons  between  the  handcoded  version  and  the  version  of  this  section 
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are  as  follows 


Handcoded  version 


No  excess  generality  version 


Percentage  Reduction  over 
handcoded  version 


Time/Seconds 
(milliseconds ) 

Space 

(words) 

21.  123 

149 

19.  915 

88 

5.  7% 

40.  9% 

1 

3.3.  2.  3 QUEUE  and  DQUEUE 

QUEUE  and  DQUEUE  contain  no  excess  generality. 

3.  3.  3 Summary 

The  removal  of  excess  generality  from  the  executive  is  estimated 
to  yield  an  efficiency  improvement  of  3.  9%  in  time  and  22.  4%  in  space. 
However,  the  space  reduction  is  closely  coupled  with  the  complexity  of 
the  DEC.  Extremely  complex  DFGs  may  require  all  the  features  of 
the  executive  and  allow  for  no  generality  reduction  while  simple  DFGs 
could  allow  on  the  order  of  a 50%  or  more  space  reduction. 

The  structure  of  the  executive  is  such  that  additional  capability 
can  be  added  relatively  easily  with  an  impact  on  space  rather  than  time. 
The  results  of  this  section  tend  to  validate  that  statement. 


3.  4 Tuning  for  HOL  l.anRuaRc  Deficiencies 


3.  4.  1 Overview 

Thie  section  provides  examples  of  efficiency  gain  that  wovild  result 
from  additional  language  features  in  the  ,1731  compiler.  The  primary 
language  features  desired  for  the  OSC  executive  are  a built-in  function 
(RIF)  capability  and  a more  powerful  inline  capability. 

The  built-in  function  capability  would  allow  certain  nmchine  instruc- 
tions to  be  accessed  by  the  HOL  directly  rather  than  through  procedure 
calls  to  assembly  language  programs.  The  efficiency  gain  is  largely 
derived  by  having  the  necessary  instructions  generated  in-line,  thereby 
circunwenting  the  procedure  linkage  overhead  and  the  required  register 
flushing.  The  built-in  functions  most  desired  for  the  OSC  executive 
are  interrupt  enabling  and  disabling,  jump  (branch  to  an  address),  and 
double  precision  fixed  point  arithn\etic  (although  a double  precision  fixed 
point  language  feature  would  be  niore  desirable). 

An  inline  feature  using  an  INLINE  directive  would  be  much  n^ore 
desirable  from  the  inline  point  of  view  rather  than  the  somewhat  obscure 
and  inflexible  DEP'INE. 

3,  4.  2 Examples 

The  exan^ples  used  in  this  section  are  derived  fron^  the  same  pro- 
cedures used  in  the  two  prev’ious  sections. 

3.  4.  2.  1 CLOCKO 

CLOCKO  n\akea  extensive  use  of  double  precision  fixed  point 
arithn\etic.  The  large  efficiency  gain  observed  is  dvie  primarily  to  per- 
forming this  arithmetic  inline.  The  recoded  version  of  this  section 
is  shown  in  Appendix  A,  Figure  A.  6. 

Comparisons  between  the  CLOCKO  version  of  Section  3,  3 and  the 
recoded  version  of  this  section  are  as  follows: 
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3.  4.  2.  2 SEND 

The  recoding  of  SEND  to  eliminate  HOL  deficiencies  shows  limited 
iniprovement.  The  improvement  comes  mainly  from  the  ability  to  manage 
registers  more  effectively  due  to  the  absence  of  procedure  calls.  The 
recoded  version  of  SEND  is  shown  in  Appendix  A,  Figure  A.  7. 


Comparisons  between  the  SEND  version  of  the  previous  section 
and  the  version  of  this  section  is  as  follows: 


Tinie/Second 
(milliseconds ) 

Space 

(words) 

No  excess  generality  version 

19.915 

88 

POL  deficiencies  removed  versi 

on 

16. 957 

83 

Percentage  reduction  over 
no  excess  generality  version 

14.  9% 

5.  7% 

3.  4.  2.  3 QUEUE  and  DQUEUE 

The  recoding  of  QUEUE  and  DQUEUE  showed  no  change  in  QUEUE 
and  a marked  improvement  in  DQUEUE  due  to  including  DSPTSK  inline 
and  making  a direct  jump  to  the  queued  procedure  rather  than  through  a 
procedure  call  to  CEXCPR. 

Comparisons  between  the  handcoded  versions  of  QUEUE  and 
DQUEUE  from  Section  3.  2 (no  change  was  observed  by  eliminating  excess 
generality)  and  the  version  of  this  section  are  as  follows: 
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Handcoded  version 


8.  504 


39 


HOL  deficiencies  removed  ver« 

ion  4.  386 

27 

Percentage  reduction  over 
handcodad  version 

48.  b% 

30.  8% 

3.  4.  3 Summary 


The  ren\oval  of  the  lantiuaKe  deficiencies 
from  the  J73I  compiler  would  yield  an  estimated 
of  34.  0%  in  time  and  E3,  6%  in  space  for  the  OSC 


described  in  this  section 
efficiency  Improvement 
execvitlve. 


j 

ti 
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3.  5 Final  Tuning 
3.  5.  1 Overview 

The  tuning  steps  described  in  sections  3.  2 through  3.  4 yielded 
an  efficiency  gain  of  67.  5'’o  in  time  and  64.  4'!'o  in  space  for  the  selected 
examples.  The  composite  results  of  the  stepwise  tuning  for  CLOCKQ, 
SEND,  Ql'ElTE,  and  IXJUEl'E  arc  shown  below. 


T ime  /Second 
(milliseconds ) 

Space 
(words ) 

Baseline 

97.  151 

483 

Tuned  by  removing  the 
inefficiencies  and  deficiencies 
and  excess  generality 

31.  ^66 

172 

Percentage  reduction  over 
baseline  version 

67.  5% 



64.  4“!. 

The  following  performance  can  be  expected  by  applying  the  results 
of  the  three -step  tuning  process  to  the  executives  of  processor  0 and 
processor  1.  The  space  statistics  reflect  a reduction  in  program  space  only. 
Data  structures  were  not  tuned. 


Processor  0 
Executive  Overhead 

Processor  1 
Execvitive  Overhead 

Time 

Space 

Time 

Space 

Baseline 

37.  I*”., 

33.  7% 

41.  0";, 

29.  2% 

Tuned'' 

12. 1";, 

20.  1% 

13.  3% 

17.  3% 

Tuning  limited  to  removing  HOI.  inefficiencies  and  deficiencies  and  8 

excess  generality.  ij 


The  final  tuning  phas**  involves  (global  n\odiflcation8  to  the 
executives  that  result  froni  the  three-step  tuning  process. 

These  global  n\odlfications  are  described  in  the  following 
svibsection, 

3.  S.  2 Kinal  Tvining  Pescription 

The  final  tuning  process  involves  exatrilnlng  the  executive  as  a 
whole,  and  studying  the  flow  of  data  and  control  between  its  parts  in  an 
effort  to  reduce  or  eliminate  the  overhead  involved  in  communication 
among  these  various  parts.  In  performing  this  process,  the  functional 
boundaries  delineated  by  function  clusters  and  Indlvidvial  procedures  ntav 
be  blurred  and  in  some  cases  lost  altogether.  The  result  of  final  tuning 
is  therefore  not  only  a smaller,  faster  executive  but  also  an  executive 
whose  parts  are  Intricately  interwoven.  Such  an  executive  is  likely  to 
be  difficult  to  understand,  modify,  and  debug. 

In  the  case  of  tuning  the  CSC  executive  for  the  DAIS  mission,  the 
final  tuning  process  was  not  necessary  to  meet  mission  requirements  but 
was  performed  in  order  to  demonstrate  the  n\ethod  .and  nteasure  of  its 
efficacy.  The  estimated  reduction  in  executive  time  overhead  resulting 
from  the  final  tuning  process  was  about  0%  of  the  tuned  overhead  or 
only  2.7%  of  the  baseline  version  overhead.  That  is,  3.'-')%  of  the  total 
overhead  reduction  is  attributable  to  final  tuning  and  'lo.  1%  to  all  other 
tuning  methods.  In  view  of  this  and  the  drawbacks  of  final  tuning  mentioned 
above,  the  process  should  be  applied  only  when  mission  reqviirements 
cannot  otherwise  be  met. 

The  types  of  optiniization  applied  in  the  final  tuning  process  in- 
clude; 

• Global  allocation  of  registers  throughout  the  executive. 

Global  allocation  of  registers  reduces  and  in  some 
cases  eliniinates  the  need  for  movement  of  operands 
back  and  forth  between  the  register  file  and  main 
niemory,  even  over  procedure  calls.  State-of-the-art 
compilers  are  capable  of  this  type  of  optimization  but 
generally  limit  the  allocation  to  local  variables  within 
a single  procedure. 


us 


!n  pi  i' s i>l  this  1 1 *1 1 i oti  ai'i*  tlu*  di'tf  i*. ‘t  t ion 

of  a rritistcr  t(>  th«“  active  oodi-  .st.n  k (ANODld  liinniuh- 
out  ITC  proci'ssinj;  and  tlu-  dt-dioat  ion  of  a n-nistor  to 
tho  i-xociitivo  control  qin-iu-  (KXt'Q)  throuulioiit  the 
»‘xccut  iv  <■ . 

Inline  expansion  of  calls  between  function  clusters. 

Here  the  meaning  of  the  term  inline  expansion  is  ex- 
tended to  nu-an  the  absorption  of  the  actions  of  part 
or  all  of  one  function  cluster  into  another  function 
cluster.  For  I'-xainple,  in  the  final  tuned  version, 
CLOCKQ  (in  the  TIM  function  cluster)  perforins  the 
functions  of  SIGNL  (in  the  IT'"  function  cluster)  by 
stacking  the  noiles  activated  by  each  clock  tiring  and 
then  f rans f t- r r ing  control  to  ITC  to  process  the  activi- 
iiodt-  stack  when  all  fired  clocks  have  be<-n  processed. 
Note  this  is  not  the  same  as  a direct  inline  expansion 
of  the  call  to  SIGNL  in  CILOCKQ.  A direct  expan- 
sion would  result  in  a (.all  to  ITC  to  priu-ess  active 
nodes  for  every  clock  firing  thus  incurring  more 
TIM/ ITC  linkage  overhead. 

F:iimination  of  procedure  call/return  linkages  where 
possible.  In  many  cases  execution  ot  executive  pro- 
cedures follow  one  another  in  a predictable  sequence. 

In  such  cases  it  is  possible  to  reduce  overhead  by 
substituting  direct  transfers  for  the  usual  proceduie 
call/return  ntechanism.  For  example,  in  the  base- 
line executive.  ENDTK  always  calls  ACTIVE  and 
then  IXJUEl’E  which  either  transfers  control  to 
another  (’xi'culiv'i'  task  (ir  to  the  ciispatclu-r  but  never 
returns.  There  is  no  n«'ed  for  ACTIVI-)  to  return  to 
ENDTK  or  for  DQltKUE  to  return  to  .•\CTT\’E.  In  the 
final  tuned  version,  ENDTK  transfers  directly  to 
active  node  processing  which  iit  turn  tratisfers  directly 
to  DQUEDE  when  the  active  node  stack  is  empty, 
1X31TEIIE  itself  transfers  directly  to  the  next  executive 
procedure  on  the  executive  control  queue  (EXCQ)  or, 
if  the  queue  is  empty,  tht-n  to  DSDTSK  which  dispatches 
an  application  task.  Naturally  some  flexibility  may  be 
lost  in  that  such  proceciurt-s  are  inseparably  tied  to  a 
particular  flow  of  control  and  therefore  may  not  he 
called  individually. 


F'inal  Tuned  Executive 


Processor 

Time  /Second 
(milliseconds ) 

Space 

(words) 

Processor  0 

wI73I  Programs 

lo78 

Assenibly  Language 
Programs 

no.  730 

2007 

Tables 

3 558 

Total 

no,  730 

724) 

Overhead 

1 1.  T’o 

moBm 

Processor  1 

J73I  Programs 

1708 

Assembly  Language 
Programs 

113. 220 

2005 

Tables 

2482 

Total 

113. 220 

bZ85 

Overhead 


The  tuned  executives  were  created  by  using  each  of  the  tuning 
methods  previously  described*  Summary  statistics  are  provided  in 
Figures  4 and  S.  The  following  sections  will  provide  detailed  statistics 
for  each  cluster. 


Cluster 

Processor  0 

Processor  1 

ITC 

23. 628 

24. 741 

TIM 

16. 610 

11.  150 

DAC 

9.  001 

17.  903 

SCH 

8.  330 

12.  141 

DTF 

39.  354 

32.  306 

MSM 

0.  000 

0.  000 

DSP 

13.  811 

14. 982 

Total 

no.  734 

113.  223 

Overhead 


Final  Tuned  Executive  Timing  Statistics 


Cluster 

Processor  0 

Processor  1 

ITC 

3257 

2315 

TIM 

368 

288 

DAC 

699 

624 

SCH 

60 

60 

DTF 

2589 

2082 

MSM 

83 

729 

DSP 

187 

187 

Total 

7243 

6285 

Overhead 

22.  1% 

19.2% 

Fig.  5 


Final  Tuned  Executive  Space  Statistics 
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3.  6.  1 ITC  Final  Tuned  Statistics 


Processor 

Tinie /Second 
(milliseconds ) 

Space 

(words) 

Processor  0 

J73I  Programs 

970 

Assembly  Language 
Programs 

23. 628 

602 

Tables 

1685 

Total 

23.  628 

1257 

Overhead 

2.  4''/„ 

9.9% 

Processor  1 

J73I  Programs 

970 

Assembly  Language 
Programs 

24.  741 

602 

Tables 

743 

. 

Total 

24. 741 

231  5 

Overhead 

2.  5<"., 

7.1% 

3.  6.  1.  1 ITC-  Tuninn  Approach 

The  reductions  in  I I'C'  functicuiality  described  below  account  foi  nuich 
of  the  space  reduction  and  a significant  improvement  in  speed.  All  frequently 
called  procedures  were  written  in  assen^bly  language.  In  ganeral,  calls  to 
small  procedures  in  other  function  clusters  were  expanded  into  inline  a88en\- 
bly  code.  In  particular,  this  was  done  for  calls  to  ENDTSK,  DROP,  RETCOR, 
and  TIME.  Internal  function  cluster  procedures  ACTIVE,  EPINS,  EGATES, 
and  DGATES  were  also  expanded  inline.  The  interface  procedure  SIGNL  w’as 
eliminated  altoget'  cr,  thus  requiring  the  TIM  cluster  to  perform  this  function 
inline.  Procedure  linkages  were  sinApllfied  and  in  some  cases  eliminated  al- 
together. For  exantple,  each  active  node  processing  procedure  sin^ply  sends 
control  to  the  active  node  procedure  for  the  next  node  on  the  active  stack.  A 


dummy  node  always  at  the  bottom  of  the  stack  automatically  transfers 
control  to  DQUEUE  when  all  active  nodes  have  been  processed.  Registers 
were  allocated  on  an  executive  wide  basis,  and  saving  and  reloading  around 
procedure  calls  was  eliminated. 

Data  structures  owned  by  ITC  were  not  modified  (e.  g.  , node  table, 
pin  table,  links  vector,  gate  table).  However,  some  reduction  in  table 
space  (especially  the  node  table)  could  have  been  achieved  by  removing 
data  items  required  only  by  unsupported  functions  (e.  g.  , ND'ETIME  which 
contains  a task  node's  maximum  execution  time). 

3,  6.  1.  2 Functional  Differences  with  Baseline 

• Only  node  types  TK,  TKS,  CSN.  GT,  SI,  DI,  DC, 

IV,  RL  are  required. 

• Of  the  node  types  implemented,  only  GT  checks  for 
disabled  inputs. 

• Only  the  '+'  link  of  the  CSN  node  is  implemented. 

• No  consume  links  are  posted, 

• Only  pin  and  links  block  gates  are  implemented, 

• A links  block  gate  may  modify  only  the  LINKS'BLK 
links  block. 

• A maximum  of  four  output  links  are  allowed  for  each 
node. 

• RL  node  does  not  invoke  EACCSS. 


3,  6.  1.  3 ITC  Final  Tuned  Detailed  Timing  Stati»tic» 

3.  6.  1.  3.  1 Proceaaor  0 

p . F’roceasing  Time  Occvirroncea  Total 

roce  lire  (microseconda ) /Second  (milliaeconda ) 

Aaaembler  Programs 
ITCACT 


Each  entry 

18.  2 

90 

1.  638 

Each  notify 

11.4 

211 

2.  405 

4.  043 

ITCPAS 

Each  entry 

18.  2 

5 

0.091 

Each  notify 

11.  4 

5 

0.  057 

0.  148 

NOTIFY 

Pin  notify 

31.  4 

57 

1.  790 

Node  notify 

20.  4 

159 

3.  244 

5.  034 

ENDTK 

37.  4 + EACCSS 

133 

4.  974 

+ 133  EACCSS 

EN  DTK'S 

74.6  4 EACCSS 

24 

1.  790 

+ 24  EACCSS 

NCSN 

16.  4 

8 

0.  131 

NDC 

32/5 

3 2x2 

No  output 

18.  2 

54 

0.  983 

Output 

26.  2 

10 

0.  262 

32/10 

16 

No  outpvjt 

18.  2 

11 

0.  200 

Output 

26.  2 

5 

0.  131 

5/4 

5 

No  output 

18.  2 

1 

0.  018 

Output 

26.  2 

4 

0.  105 

1.  699 

NDI 

24.  4 t DACCSS 

29 

0.  708 

f EACCSS 

+ 29  BACCSS 
+ 29  EACCSS 

NRE 

13.  8 f DACCSS 

60 

0,  828 

t SEJ'ID 

i 60  BACCSS 
+ 60  SEND 

NSI 

16.  4 

8 

0.  131 
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Procedure 


Processing  Time 
(microseconds ) 


Occurrences  Total 

/Second  (milliseconds) 


NTK 

14.  6 + BACCSS 

157 

2.  292 

+ SCHED 

+ 157  BACCSS 
+ 157  SCHED 

PLINKS 

Each  0 link  post 

2.0 

98 

0.  196 

Each  1 link  post 

12.  4 

59 

0.  732 

Each  3 link  post 

28.  8 

32 

0.  922 

1.  850 

Total 

23. 628 

+ 246  BACCSS 
+ 186  EACCSS 

Overhead 

2.4% 

3.  6.  1,  3.  2 Processor  1 

Procedure 

Processing  Time  Occurrences  Time 

(microseconds) 

/Second 

(milliseconds ) 

Assembler  Programs 

ITCACT 

Each  entry 

18.  2 

60 

1.092 

Each  notify 

11.4 

105 

1.  197 

2.  289 

ITCPAS 

Each  entry 

18.  2 

50 

0.  910 

Each  notify 

11.4 

66 

0.  752 

TTZUI 

NOTIFY 

Static  pin  notify 

31.  4 

2 

0.  063 

Static  gate  notify 

10.  0 

2 

0.  020 

Static  node  notify 

20.  4 

105 

2.  142 

Dynamic  nil  notify 

30.  8 

61 

1.  879 

Dynamic  node 

41.  2 

1 

0.  041 

notify 

4.  145 

ENDTK 

37.  4 

121 

4.  525 

ENDTKS 

74.  6 

103 

7.  684 

NTK 

14.  6 

224 

3.  270 

NDI 

24.  4 

1 

0.  024 

NKL 

13.  8 

2 

0.  020 

PLKO 

2.  0 

160 

0.  320 

PLKl 

12.4 

64 

0.  794 

Total 

24.  741 

Overhead 

2.  5% 

102 
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ITC  Final  Tuned  Detailed  Space  Requirementa 


i.  6.1.4. 1 Processor  0 


J73I  Programs 


ITCINT 

100 

ELGATE 

128 

ILINK 

106 

END  LG 

24 

ICNDW 

20 

DLGATE 

126 

RLNK 

128 

DNDLG 

28 

DCNDW 

22 

ENBLGATE 

34 

DSBLGATE 

34 

SIGN  LEV  ENT 

34 

SIGN  LDE:V  ENT 

34 

SIGAC 

20 

Program  Data  Space 

132 

Words 


970 


Assembler  Programs 


ITCACT 

30 

ITCPAS 

30 

NOTIFY 

84 

NDI 

24 

NIV 

20 

NGT 

22 

NRL 

18 

NCSN 

24 

NDC 

26 

EGATE 

14 

DGATE 

14 

DQ  SIGNAL 

30 

101 


Assenibler  Programs  (Continued) 


DLINKS 

26 

P LINKS 

46 

ENDTK 

42 

EN  DTKS 

104 

NS  I 

16 

NTK 

16 

Program  Data  Space 

16 

602 

Data 

Same  as  baseline 

1685 

Total 

3257 

- 

Overhead 

9.  9% 

3.  6.  1.4,  2 Processor  1 

Words 

J73I  Programs 

5>ame  as  processor  0 

970 

Assembler  Programs 

Sanie  as  processor  0 

602 

Data 

Same  as  baseline 

743 

Total 

2 315 

Overhead 

7.  1% 

104 


r 


3.6.  1.5  rrC  Sensitivity  Analysis 

THp  tinip  overhead  In  the  final  tuned  ITC  is  about  28%  of  the 
baseline  version's  overhead  in  both  processors.  A slightly  higher  per- 
centage (60%-70%)  of  ITC  time  is  spent  in  processing  active  nodes 
than  In  the  baseline.  This  is  due  to  absorption  of  segments  of  other 
functions  into  inline  code,  in  particular  the  posting  of  clock  pins.  Most 
of  the  active  node  processing  is  of  task  nodes;  hence  a doubling  of  the 
nimiber  of  tasks  or  their  rates  of  execution  would  result  in  approximately 
a 50%  Increase  in  ITC  overhead,  or  1.2%  of  total  processor  overhead. 
The  remaining  30%  to  40%  is  consumed  in  processing  I/O  complete  noti- 
fications. 

Program  space  is  dramatically  reduced  by  the  elimination  of 
unnecessary  generality  and  the  compaction  resulting  from  hand  coding. 

I 

f 
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3.  6.  2 TIM  Final  Tuned  Statistics 


Processor 

Time /Second 
(milliseconds ) 

Space 

(words) 

Processor  0 

J73I  Programs 

158 

Assembly  Language 
Programs 

16.  610 

70 

Tables 

140 

Total 

16. 610 

368 

Overhead 

1.  7% 

1.1% 

Processor  1 

J73I  Programs 

158 

Assembly  Language 
Programs 

11.  150 

70 

Tables 

60 

Total 

11.  150 

288 

Overhead 

1.  1% 

0.9% 

3.  6.  2.  1 TIM  Timing  Approach 

The  functions  TIME,  TGTR,  TSUM,  TDIF  were  expanded  as  inline 
assembly  code  wherever  called  and  therefore  are  not  implemented  as 
procedures  in  this  cluster.  SETCLOCK  and  SETALRM  were  merged 
into  CLOCKQ  which  was  recoded  in  assembly  language.  The  interface 
between  CLOCKQ  and  ITC  was  modified  so  that  the  functions  performed 
by  SIGNL  in  ITC  were  done  inline  in  CLOCKQ.  Instead  of  calling  ACTIVE 
for  each  enabled  pin,  CLOCKQ  stacks  the  activated  nodes  and  exits  by 
transferring  control  to  the  active  node  procedure  for  the  first  notie  on 
the  stack. 

A more  efficient  method  of  handling  the  timer  interrupt  (INTCKA) 
was  devised.  This  method  of  handling  interrupts  is  discussed  in  section 

3.  6.  7.  1. 

The  TIM  owned  data  structures  were  not  modified. 
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3.  6.  2,  2 Functional  Differences  with  Baseline 


• One  shot  alarms  are  not  implemented. 


3.  6.  2.  3 TIM  Final  Tuned  Detailed  Timing  Statistics 
3.  6.  2.  3.  1 Processor  0 


Procedure 


Processing  Time  Occurrences  Total 
(microseconds)  /Second  (milliseconds) 


Assembler  Programs 
INTCKA 

From  application  task 
From  executive  task 


CLOCKQ 
Entries 
Clock  firings 
Multiple  pin  clocks 
Pass  first  clock  on  list 
Pass  additional  clocks 
on  list 


24.0 

81 

1.  944 

15.  0 

9 

0.  135 

2.  079 

13.  2 

90 

1.  188 

42.  8 

127 

5.  436 

18.6 

96 

1.  786 

16.2 

127 

2.057 

8.0 

508 

4.  064 

14.  531 

Total  16.  610 

Ovievhead  1. 7% 


3.  6.  2.  3.  2 Processor  1 
Procedure 

Assembler  Programs 


Processing  Time  Occurrences  Total 
(microseconds)  /Second  (milliseconds) 


INTCKA 


From  application  task 

24.0 

60 

1.  440 

From  executive  task 

15.0 

9 

0.  135 

1.  575 

CLOCKQ 

Entries 

13.2 

67 

0.  884 

Clock  Firings 

42.  8 

103 

4.  408 

Multiple  Pin  Clocks 

18.  6 

52 

0.  967 

Pass  first  clock  on  list 
Pass  additional  clocks 

10.2 

103 

1.  668 

on  list 

i 

8.0 

206 

Total 

Cv«rhead 

1.  648 

9.  575 

11.  150 

1.  1% 

107 


i.b.Z.4  TIM  Final  Tunod  Detailed  Space  vStatistics 


J. 6. 2. 4.1  Proceaaor  0 


J7  3I  Programs 
TIMINT 


STARTCLOCK  112 

Program  Data  Space  2b 


Assembler  Programs 

CLOCKQ 


Same  as  baseline 


W ords 


Total 


Overhead  1.1% 


3. 6. 2. 4. 2 Processor  1 

J73I  Programs 

Same  as  processor  0 

Assembler  Programs 

Same  as  processor  0 


Words 


Same  as  baseline 


Total 


Overhead  0.  b% 


TIM  Sensitivity  Analysis 


The  time  overhead  in  the  final  tuned  version  is  about  20‘”o  of 
the  untuned  version's  overhead.  Searching  the  clock  list  to  insert  a 
clock  which  has  just  fired  accounts  for  about  30%  of  the  processing. 
Additional  independent  clocks  result  in  more  clocks  firing  per  second 
and  more  clocks  to  search  through  on  the  queue  after  each  firing. 
Hence.  TIM  overhead  is  a function  of  the  product  of  firings /second 
and  the  nuniber  of  independent  clocks. 

3.  6.  3 DAC  Final  Tuned  Statistics 


Processor 

Time  /Second 
(milliseconds ) 

Space 
(words ) 

Processor  0 

J73I  Programs 

172 

Assembly  Language 
Programs 

9.001 

200 

Tables 

327 

Total 

9.  001 

899 

Overhead 

0.  9% 

2.1% 

Processor  1 

J73I  Programs 

172 

Assembly  Language 
Programs 

17. 903 

o 

o 

Tables 

252 

Total 

17. 903 

024 

Overhead 

1.  8% 

1.9% 



3.  6.  3.  1 DAC  Tuning  Approach 

All  frequently  called  procedures  were  recoded  in  assembly 
langviage.  The  calls  to  procedures  RETCOB  aod  GETCOR  in  MSM  were 
coded  inline  as  simple  stacking /unstacking  operations  (all  dynamic  storage 
blocks  being  the  same  sire).  Internal  procedvire  CPYD  was  coded  inline 


as  a MOV  instriu-tion.  Tho  v»ccfss  functions  required  were  HW’D,  h'SWS, 
USRS,  CSTN.  CS  r.  US  I N.  US  T and  UPT. 

I'iie  processing  of  access  lists  was  handled  in  a roanner  siniilar  to 
the  handling;  of  the  active  node  stack  in  ITC.  UACUSS  and  EACCSS  start 
the  processing;  hy  t ransfe r ring  to  the  hej;in  or  end  access  procedure  res- 
pectively for  the  first  access  controller.  Each  access  procedure  then 
transfers  control  to  the  appropriate  access  proceilure  for  the  next  controller 
on  the  list.  Each  list  is  terniinated  hy  a dummy  access  controller  which 
causes  control  to  he  returnetl  directly  to  the  caller  of  UACCSS  or  EACCSS. 

The  UAU  owned  data  structures  were  not  modified. 

1.  t'.  3.  2 Functional  Uifferences  with  Baseline 

• Only  the  functions  PACCSS,  KAt-'CSS,  UARP,  FRS,  PW'P, 

IVSRS.  ESWS.  CSTN.  CST.  USTN,  UST  and  UDT  are  iniple- 
juented. 


,6.3.3  P.AC  Final  luned 

Petailed  Timing;  Statistics 

. n. 3. 3. 1 Processor  0 

Procedure 

IVocessinj;  lime 

Occur  reijces 

Total 

(niicroseconds ) 

'Second 

(n\illiseconds ) 

kssernbler  Proijrants 

RACCSS 

S.  h 

224 

1.  02 P 

FACCSS 

S.  P 

lP4 

1.  410 

USTN 

4 

151 

2.  325 

UST 

11.  S 

10 

0.  118 

UPT 

Ip.  2 

32 

0.  518 

CSTN 

17.  P 

8 

0.  140 

CST 

14.  4 

25 

0,  360 

RWP 

18.  2 

32 

0.  582 

FSWS 

16.  P 

34 

0.  530 

RSRS 

17.  0 

10 

0.  170 

Words  copied  via 

MOV  instruction 

2.  0 

401 

0.  022 

Total 

0.  001 

C'h’er  head 

0.  o-’;, 
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3.  6.  3.  3.  2 Processor  1 


Procedure 

Processing  Time 

Occurrences 

Total 

(microseconds) 

/Second 

(milliseconds ) 

Assembler  Programs 

BACCSS 

8.  6 

227 

1.  952 

EACCSS 

8.  6 

225 

1.  935 

USTN 

15.  4 

103 

1.  586 

UST 

11.  8 

5 

0.  059 

BARD 

16.  6 

352 

5.  843 

ERD 

17.  0 

352 

5.  984 

ESWS 

13.  6 

40 

0.  544 

Total 

17. 903 

Overhead 

CO 

3,  6.  3.  4 DAC  Final  Tuned 

Detailed  Space  Statistics 

3.  6.  3.  4,  1 Processor  0 


J7  3I  Programs 
DACINT 

Program  Data  Space 


Assembler  Programs 
BACCSS 
EIACCSS 
BARD 
EARD 
CST 
USTN 
UST 
UDT 
BWD 
BRS 
EWS 
CSTN 

Program  Data  Space 


Words 


150 

22 


8 

8 

16 

22 

14 

14 

12 

14 

18 

16 

14 

16 

18 

200 


111 


Data 


Same  as  baseline 


327 


Total 

699 

Overhead 

2.1% 

4.  2 Processor  1 

Words 

J73I  Px'ograms 

Same  as  processor  0 

172 

Assembler  Programs 

Same  as  processor  0 

200 

Data 

Same  as  baseline 

Total 

Overhead 

252 

624 

1.9% 

3.  6.  3.  5 DAC  Sensitivity  Analysis 

The  time  overhead  in  the  final  tuned  version  of  DAC  is  about  30% 
of  that  in  the  baseline  version.  Copying  of  data  in  static  storage  blocks 
accounts  for  10%  of  DAC  overhead  in:  processor  0 and  5%  in  processor  1. 
In  both  processors  the  rates  at  which  static  data  is  copied  is  relatively 
low.  If  the  system  response  requirement  and  sharing  of  data  were  such 
that  400  static  blocks  averaging  10  words  had  to  be  copied  every  second, 
then  the  DAC  overhead  in  processor  0 would  be  doubled. 
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3.  6.  4 SCH  Final  Tuned  Statistica 


Processor 

Time/Second 

(milliseconds) 

Space 

(words) 

Processor  0 

J73I  Programs 

16 

Assembly  Language 
Programs 

8.  330 

44 

Tables 

Total 

8.  330 

60 

Overhead 

0.  8% 

0.2% 

Processor  1 

J73I  Programs 

16 

Assembly  Language 
Programs 

12. 141 

44 

Tables 

Total 

12. 141 

60 

Overhead 

1.2% 

0.2% 

3.  6.  4.  1 SCH  Tuning  Approach 

The  outside  calls  to  procedures  DROP  and  AUTASK  were  recoded 
as  inline  assembler  code.  The  spare  time  computing  procedure  (SPRTIM) 
was  dropped  since  it  is  not  required  in  a checked  out  system.  SCHED  was 
coded  in  assembler  language  thus  eliminating  calls  to  TGTR  and  TSUM. 

S>ata  structure  used  by  SCH  were  not  modified. 

3.  6.  4.  2 Functional  Differences 

• Spare  task  execution  time  is  not  computed. 

• Overload  flag  is  not  used. 


3.  6,  4.  3 SCH  Final  Tuned  Detailed  Timine;  Statistics 


3.6.  4.  3.1  Processor  0 
Procedure 


Processing  Time  Occurrences 
(microseconds)  /Second 


Total 

(milliseconds ) 


Assembler  Programs 

SCHED  entries 

Active  tasks  not 
preemptable 

More  urgent  tasks 


28.  4 

157 

4.  459 

9.8 

157 

1.  359 

8.  0 

314 

2.  512 

Total 

8.  330 

Overhead 

0.  8% 

3. 6. 4. 3.  2 Processor  1. 
Procedure 


Processing  Time 
(microseconds ) 


Occurrences 

/Second 


Total 

(milliseconds) 


Assembler  Programs 


SCHED  entries 

28.  4 

224 

Active  tasks  not 

preemptable 

9.8 

More  urgent  tasks 

8.0 

448 

Total 

Overhead 


6.  362 

2.  195 

3.  584 

12.  141 

1.  2% 


3.6.4.  4 SCH  Final  Tuned  Detailed  Space  Statistics 


3. 6. 4.  4.1  Processor  0 


J73I  Programs 

SC  HINT  12 

Program  Data  Space  4 

Assembler  Programs 

SCHED  ii 


Data 

None 


Total 

Overhead 


3.6.4.  4.  2 Processor  1 

J73I  Programs 

Same  as  processor  0 

A::sembler  Programs 

Same  as  processor  0 

Data 

None 

Total 

Overhead 


Words 

16 

44 

60 

0.2% 

Words 

16 

44 

60 

0.2% 
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3.  (>.  4.  B SCH  SenHitivity  Analysis 

The  time  overhead  in  the  final  tuned  version  of  SCU  is  approxi- 
n\ately  of  the  untuned  version.  In  the  tuned  version  as  in  the  baseline 

version,  loop  setup  and  actual  insertion  accounts  for  approximately  ‘iO'Vo 
of  the  overhead.  The  overhead  is  directly  proportional  to  the  number 
of  tasks  schedxiled  per  second  .oes  the  average  nun^her  of  tasks  which 
are  in  the  queue  and  are  more  urgent.  In  contrast  to  the  clock  queue 
which  gets  longer  with  each  additional  clock,  the  scheduler  queue  length 
tends  to  remain  nearly  empty.  I'asks  are  removed  from  the  queue  as 
fast  as  they  are  inserted  except  at  brief  beats  where  several  tasks  are 
scheduled  nearly  simultaneously. 

3.  (>.  DTK  h inal  I vined  Statistics 


Processor 

Time /second 
(n^ilIi8econd8) 

Space 

(words) 

Processor  0 

J73I  Programs 

0 

270 

Assembly  Language 
Programs 

39. 354 

1051 

Tables 

1268 

Total 

39. 354 

2589 

Overhead 

3.  9% 

7.  9% 

Processor  1 

J73I  Programs 

0 

300 

Assembly  Language 
Programs 

32.  306 

1139 

Tables 

643 

Total 

32. 306 

2082 

Overhead 

3.  2% 

6.  4% 

3.  6.  S.  1 TiinirtR  Approach 


DTF  tuning  consisted  of  deleting  the  generalities  associated  with 
dynamic  subaddress  allocation,  priority,  bus  control  passing,  and  dynamic 
storage  when  they  were  not  required,  including  ENABLE,  DISABL,  QUEUE, 
DQUEUE,  GETCOR  and  RETCOR  inline  and  straight  line  coding  of  the  loops 
in  DTFPAS.  Space  was  reduced  additionally  by  removing  head  only  code  in 
processor  1 and  non-head  only  code  in  processor  0, 

3.  6.  5.  2 Functional  Differences  with  Baseline 

The  following  are  differences  between  the  baseline  executive  and  the 
final  tuned  executive  in  processor  0; 

• Control  is  passed  on  a round  robin  basis. 

• Only  one  remote  receive  subaddress  field. 

• One  dummy  word  is  transmitted  when  control  is  passed. 

• Dynamic  storage  for  remote  receive  not  supported. 

e Unallocated  subaddresses  for  master  receive  not 
supported. 

The  same  differences  occur  in  processor  1.  The  following  additional 
differences  exist: 

e Dynamic /unallocated  subaddresses  for  master  transmit 
not  supported. 

• Dynamic  storage  remote  receive  is  supported. 
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3.  6.  *1.  3 DTF'  Final  'I'lined  l^etailed  Timing  Statiatica 


3.  6.  5.  3.  1 Proceasor  0 


J Procesalng  Time 

Occurrences 

Total 

Procedure  (microseconds) 

/ second 

(milliseconds) 

Assembler  Programs 

DTFACT 

ITCACT  to  be  queued 

21.6 

90 

1.  944 

No  ITCACT  notification 

req. 

16.0 

10 

. 160 

Dynamic  storage  data 

transmitted 

23.  8 

32 

. 762 

2.  866 


DTFPAS 


Each  entry 

26.  4 

100 

2.  640 

ITCPAS  to  be  queued 

9.2 

5 

. 046 

Each  block  of  data  received 

8.  8 

5 

, 044 

2.  730 

DTFKEY 

29.0 

IQO 

2.  900 

SENDYN 

29.0 

32 

. 928 

SEND 

Allocated  entry 

Allocated  additional 

28.  8 

195 

5.  620 

command  word /entry 

27.  2 

262 

7.  130 

Allocated  notify 

9.6 

195 

1.  870 

Unallocated  entry 
Unallocated  additional 

48.  8 

40 

1.  <»52 

Command  word /entry 

47.2 

10 

. 472 

Unallocated  notify 

9.  6 

20 

. 190 

17.  170 

BCI2 

Take  control  of  bus 

79.2 

100 

7.  920 

Relinquish  control  of  bus 

48.  4 

100 

4.  840 

12.  760 


Total  30.354 

Overhead  3. 0% 
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3.  6.  *>.  3.  2 Processor  1 

I’rocedure 

Assembler  Programs 
DTFACT 

Processing  Tinie 

Occurrences 

Total 

(microseconds) 

/ second 

(milliseconds) 

ITCACT  to  be  queued 

No  ITCACT  notification 

15.6 

60 

. 936 

required 

10.0 

40 

. 400 

1.  336 

DTFPAS 

Each  entry 

73.4 

100 

7.  340 

ITCPAS  to  be  queued 

Each  block  of  statically 

9.2 

50 

. 460 

allocated  data  received 
Each  block  of  dynamically 

8.  8 

5 

. 044 

allocated  data  received 

19.6 

60 

1.  176 

9.020 

DTFKEY 

15.  8 

100 

1.  580 

SEND 

Each  entry 

Each  additional  command 

25.  2 

110 

2.  770 

word /entry 

23.  6 

165 

3.  890 

Each  notify 

9.2 

103 

. 950 

7.  610 

BCI2 

Take  control  of  bus 

79.2 

100 

7.  920 

Relinquish  control  of  bus 

48.4 

100  4.  840 

Total 

Overhead 

12. 760 

32.  306 

3.  2% 

DTF  Final  Tuned  Detailed  Space  Requirements 


3.  6.  5.  4,  1 Processor  0 


J73/I  Programs 

DTFINT 

FLIP 


Words 


Assembler  Programs 

SEND 
SENDYN 
DTF PAS 
DTFACT 
DTFKEY 

BCIU  Initialir.ation 
BCIU  inter rvipt 
handlers 


DEVTBL  36 

CWTBL  636 

CWQCWP  41 

MNOTFY  64 

RNOTFY  16 

RCODE  5 

FLPDEV  40 

Miscellaneous  Itenis  J73I  8 

Subaddress  Pointer  Words  112 

Storage  for  Executive  'Signals'  6 
Scratch  Storage  32 

Miscellaneous  Items  Assembler  32 
Command  Word  Storage  240 

Total 

Overhead 


1 


I 3.  6.  5.  4. 2 Procesgor  1 

^ Words 

I 

* J73/I  Programs 

A 

DTFINT  132 

FLIP  168 

300 

Assenibler  Programs 

SEND  34 

DTFACT  16 

DTFPAS  274 

DTFKEY  20 

BCIU  Initialiration  42 
BClU  interrvjpt 

handlers  753 

1139 

Data 


DEVTBL 

40 

CWTBL 

164 

MNOTFY 

16 

RNOTFY 

64 

RCODE 

15 

FLPDEV 

30 

Misc  Items  J73I 

8 

Subaddress  Pointer  Words 

92 

Storage  for  Executive  'Signals 

12 

i 

Scratch  Storage 

32 

< 

Misc  Items  Assembler 

30 

Command  Word  Storage 

140 

643 

Total 

2082 

Overhead 

6.  4% 
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1 

I 


1 
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o.  5,  S OTF  Sonsitivity  Analysis 


The  titiie  overheatl  in  the  final  tuned  version  of  the  DTF  is  approxi- 
mately 45'!'i)  of  the  processor  0 baseline  and  35*’')  of  the  processor  I baseline. 
The  tinio  retiuction  for  the  final  tuned  DTF'  over  the  baseline  was  less 
(iraniatic  tlian  for  otl\er  function  clusters  because  a sinnificant  portion  of 
the  baseline  was  already  in  assembly  lan^ua)>e.  The  final  tvined  assembly 
language  pt>rtion  was  over  72%  of  the  time  overhead  of  the  baseline  version 
willi  the  r«'ductioii  due  almost  entirely  to  a reduction  in  ijenoralitv. 

The  time  overheat)  of  the  final  tuned  version  is  sensitive  to  tl\e 
sajiie  parameters  as  tlie  baseline  version,  namely.  t)ie  number  of  times 
control  is  passed  per  second  (IICI2)  and  the  number  of  different  inter - 
processor  data  blocl^s  tliat  can  be  received  (OTFPAS). 

An  alternative  approach  to  DTPPAS  may  be  desirable  for  executives 
wliere  the  number  of  different  interprocessor  data  blocks  is  larjie  (on  t)ie 
order  of  twenty  or  more). 

DTFPAS  currently  l\as  all  remote  receive  (interprocessor)  t.ajj- 
words  as  rero  prior  to  entering  remote  mode.  PTFPAS  examines  each 
tagword  for  non-zero  (data  received)  eacii  tin\e  anot)\er  processor  re- 
linquislies  master  control.  An  alternative  approach  would  have  Die  proces- 
sor in  master  control  allocate  a sequential  subaddress,  starting  at  one, 
for  eacli  processor  to  processor  transniis sion,  Tfiis  would  allow  1'>TFI'’A.S 
to  terminate  tlie  tagword  examination  on  encountering  tlie  first  non-.-cro 
tagword,  DTFPjXS  overliead  w’ould  be  dramatically  reduced  for  large 
numbers  of  different  interprocessor  data  blocks,  with  increa.sed  overlieac) 
in  dealing  witli  dynamic  storage  and  extracting  the  data  l>lock  identifica lion 
froni  the  data. 


3,  b.  b MSM  Final  Tvined  Statistics 


Processor 

Time  / second 
(milliseconds) 

Space 

(words) 

Processor  0 

1 

1 

J73I  Programs 

0 i 

Assembly  l.anguage 
Programs 

» ! 

Tables 

b9 

Total 

0 

r 

83 

Overhead 

0<*'« 

. 2% 

Processor  1 

J73I  Progranis 

0 

Assembly  Language 
Programs 

14 

Tables 

715 

1 Total 

0 

729 

Overhead 

0% 

2.  2% 

3.  6.  b.  1 Functional  Differences  with  Baseline 

The  tuned  version  on  each  processor  is  capable  of  allocating  only 
blocks  34  words  long. 


3.  6.  b.  2 Tuning  Approach 

The  cluster  maintains  a list  of  34  word  blocks.  Each  time  a block 
is  allocated,  it  is  removed  from  the  front  of  the  list:  when  returned,  it  is 
placed  at  the  front  of  the  list. 

The  code  to  perform  these  activities  has  been  placed  inline  in  all 
places  where  calls  are  required,  except  in  initialization.  The  time  and 
space  for  these  has  been  included  in  the  statistics  for  the  cluster  manipu- 
lating the  storage.  Each  inline  call  to  GETCOR  is  b.  2 microseconds  and 
to  RETCOR  is  b.  4 microseconds  yielding  . O4'’o  overhead  in  processor  0 
and  , Oh"'!)  overhead  in  processor  1. 
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3,  t).  b.  3 MSM  Final  Tuned  Detailed  Tinuns  Statistics 
3.  b.  b.  3.  1 Processor  0 

The  tinxes  have  been  included  in  the  calling  cluster. 
3.  6.  b.  3.  2 Processor  ^ 

The  tinies  have  been  inclvsded  in  the  calling  cluster. 

3.  b.  b.  4 MSM  Final  l uned  Detailed  Space  Statistics 
3.  b.  6.  4.  1 Processor  ^ 

Words 

Assembler  Progran^s 

MSMINT  14 

The  CiETCOR  and  RPTCOR  space  has 
been  included  in  the  calling  clusters. 

Data 

Two  blocks  b^ 

83 

3.  b.  6,  4.  2 Processor  1 

Words 

Assembler  Programs 

MSMINT  14 

The  GETCOR  and  RETCOR  space  has 
been  included  in  the  calling  clusters. 

Data 

Twenty-one  blocks  " 1^ 


1 

n 


4 


3.  b.  b.  5 Sensitivity  .Analysis 

The  sequence  of  calls  does  not  affect  timing  since  blocks  are 
always  removed  from  the  front  of  the  list  and  placed  on  the  front  when 
returned.  However,  the  call  sequence  does  affect  space  since  there  must 
be  enough  blocks  to  satisfy  all  outstanding  requests. 
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3.  6.  7 DSP  Final  Tuned  StatieticB 


Processor 

Time /Second 
(milliseconds ) 

Space 

(words) 

Processor  0 

J73I  Programs 

92 

Assembly  Language 
Programs 

13.  811 

26 

Tables 

69 

Total 

13. 811 

187 

Overhead 

1.  4% 

0.6% 

Processor  1 

J73I  Programs 

92 

Assembly  Language 
Programs 

14. 982 

26 

Tables 

69 

Total 

14. 982 

187 

Overhead 

1.  5% 

0.  6% 

3.  6,  7.  1 DSP  Tuning  Approach 

A more  efficient  interrupt /return  linkage  was  devised.  Register  14 
was  dedicated  to  the  save  area/stack  pointer  (CURSAV).  When  the  processor 
is  in  the  application,  register  14  points  to  the  top  of  the  stack.  When  the 
executive  is  entered  (either  via  an  interrupt  or  when  an  application  task 
completes),  register  14  is  complemented.  Hence,  when  an  interrupt  is 
taken,  the  interrupt  handler  can  immediately  determine  whether  the  proc- 
essor was  in  the  task  or  executive  state  by  testing  the  sign  of  register  14 
imposed  by  the  baseline  executive.  Use  of  register  14  also  eliminates 
the  restriction  in  memory  layout  and  thereby  avoids  saving  registers 
when  in  the  executive  state.  When  the  executive  is  exited  via  DSPTSK, 
register  14  is  again  complotnented  to  indicate  return  to  the  task  state. 


The  interface  functions  ENABLE,  DISABL,  QUEUE  and  ENDTSK 
were  eliminated  in  favor  of  inline  code  in  all  calling  procedures. 
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3.  6.  7.  2 Functional  PifferenceB 
• None. 


i.  (t.  7.  3 nsi’  Final  TuiumI  I'in^inK  Statistic 


3.  6.  7.  3.  1 Processor  0_ 
Procetlvire 


Proceasing  'I'inu* 
(inicroseconrls) 


Occurrences  Total 

/Second  (milliseconds) 


Assendiler  Programs 

QUEUE 
Each  entry 
Task  ends 

DTF  interrupt  returns 
riM  interrupt  returns 

Each  item  queued 
DTF  queuing 
Interrupts  in  exec 


nSPTSK 

Interrupt  returns 
Task  starts 

RS'I  AU  !• 

START 


6.  8 

157 

200 

00 

407 

3.  040 

6.  8 

95 

29 

■^4 

(10%  of  290) 

0.  843 

3.  883 

7.2 

7.  8 

290 

157 

2.  088 
1.  225 

3.  313 

20.  0 

290 

5.  800 

11.  2 

157 

1.  758 

Total 

14.  75 

Overhead 

1.  5% 

I 

I 

1 

I 

i 

I 

i 


l! 


ife 


3,  6.  7.  3.  2 PtoccBsor  1 


Procedure 


Procesaing  Time  Occurrences  Total 

(microseconds)  /Second  (milliseconds) 


Assembler  Programs 


DQUEUE 
Each  entry 


Task  ends 

DTE  interrupt  returns 
TIM  interrupt  returns 

6.  8 

224 

200 

67 

4^ 

3.  339 

Each  item  queued 

DTF  queueing 
Interrupts  in  exec 

6.  8 

110 

27 

TTT 

(10%  of  267) 

0.  952 

4.  271 

DSPTSK 

Interrupt  returns 

Task  starts 

7.  2 

7.  8 

267 

224 

1.  922 
1.  747 

3.669 

RSTART 

20.  0 

267 

5.  340 

START 

11.2 

224 

2.  509 

Total 

15. 789 

Overhead 

1.  6% 

).f).7.4 


DSP  Kinal  Tuned  Detailed  Space  Statistics 


3.  6.  7,  4. 1 Processor  0 

J73I  Programs 
DSPINT 

Program  Data  Space 


Assembler  Programs 
DQUEUE 
DSPTSK 
RSTART 
START 


Words 


78 

14 

9'‘ 


6 

4 

6 

10 


26 


Data 

Same  as  baseline 


69 


Total  187 


Overhead  0. 6% 


3. 6. 7. 4. 2 Processor  1 

J73I  Programs 

Same  as  processor  0 

Assembler  programs 

Same  as  processor  0 

Da  ta 

Same  as  bas»Tine 

Total 

Overhead 


Words 

92 

26 

187 

0.  6% 
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3.  6.  7.  5 DSP  Sensitivity  Analyeia 

The  time  overhead  for  the  final  tuned  version  of  DSP  is  approximately 
37%  of  the  baseline  version's  overhead.  Since  parts  of  the  dispatchers  were 
already  hand  coded  in  assembler  language,  the  improvement  in  speed  is 
less  marked  than  that  for  other  clusters.  The  dispatcher  overhead  depends 
prin\arily  on  the  mm\ber  of  interrupts  and  the  number  of  task  starts  per 
second.  Each  ad<litional  100  inter rupts /second  consumes  about  0.4%  of 
processor  time.  I'lach  additional  100  task  starts  per  second  consumes 
about  0.  3%  of  processor  time. 

3,  6.  8 SSM  Final  Tuning  Statistics 

Secondary  Storage  Management  is  not  implemented. 

3.  6.  9 MPL  Final  Tuning  Statistics 

Multiprocessor  Locking  is  not  implemented. 

3.  6.  10  Final  Tuned  Bus  Traffic 

The  only  difference  between  the  baseline  and  tuned  bus  control  is 
that  only  one  data  word  is  required  to  pass  control.  The  effect  of  this 
reduced  bus  overhead  by  0.  4%. 

3. 7 Conclusions 

The  approach  taken  allowed  small  and  efficient  baseline  executives 
to  be  written.  These  were  then  tuned  significantly  for  the  DAIS  mission. 

The  executives  were  tuned  by  approximately  72%  for  time  and  47%  for 
space.  The  following  subsections  present  the  time  and  space  statistics 
by  function  cluster  and  tuning  method. 


3.  7.  1 Statistical  Reduction  Allocated  by  Function  Cluster 

This  section  provides  a series  of  figures  which  illustrate  the  time 
and  space  overhead  allocated  to  each  function  cluster,  and  the  overhead 
reduction  between  the  baseline  and  tuned  executives.  Figures  6 and  7 
shows  the  processor  0 time  and  space  overhead,  and  Figures  8 and  4 show 
the  time  and  space  t)verh«'ad  for  processor  1. 


t . 7 . ^ Sta tistical  J ion  Allocated  by  Tunint;  Method 

The  primary  tuning  methods  employed  were  described  in  Sections 
'..i,  i.  i.4,  and  i.  S.  This  section  generalizes  from  the  results  obtained 
in  the  sample  progran\s  studied,  and  uses  the  statistics  to  find  the  improve- 
ment over  the  baselint-  <'xecutive.  The  first  tuning  method,  hand  coding  to 
eliminate  compiler  inefficiency,  was  shown  to  reduce  the  processor  0 base- 
line executive  time  by  4S.  2%  and  space  by  40.0%.  The  removal  of  excess 
generality  reduced  baseline  executive  time  by  2.  0%  and  space  by  1 i.  4%. 

It  should  be  noted  that  the  removal  of  excess  generality  is  heavily  dependent 
upon  the  executive  requirements  of  the  particular  mission.  The  removal  of 
HOI,  deficiencies  reduced  baseline  executive  tirne  by  lb.‘t%  and  space  by 
11.0%,  Final  tuning  reduced  baseline  sparetime  overhead  2,7%,  but  in- 
creases space  by  1,0%.  This  implies  the  following  percent  of  total  reduction 
allocated  to  each  tuning  method; 


1 

Time/second 

(milliseconds) 

Space 

(words) 

■Mil— 

b9. 1% 

64.  0% 

2.9% 

21.4% 

~ ~ 1 

HOL  deficiency 

24.  1% 

1 7.  6% 

Final  Tuning 

3.  9% 

- 3.0% 

It  is  interesting  to  note  that  only  2.i>%  of  the  time  savings  is  due 
to  reducing  generality.  This  suggests  the  possibility  of  not  reducing  gene 
rality  (unless  the  21.4%  space  savings  is  required)  and  maintaining  a 
mission  independent,  tuneil  executive.  In  addition,  final  tuning  may  not 
be  desired,  as  discussed  In  Section  i.  S.  2. 
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Appendix  A 


EXECUTIVE  TUNING  EXAMPLES 


I; 


IHIS  PAGE  IS  BEST  QUALITY  PRACTICABLB 
IjgOiM  COPY  EUKNlSHiiD  IV  UDC  


A.  1 CLOCKQ  - TUNED  FOR  HOL  INEFFICIENCY 


CLOCKO 

EOU 

$ 

STM 

R5, CLOCSV 

• 

5,4 

JS 

R2, Enable 

• 

2.0 

L 

RJ,TIM'NEXT 

• 

2.M 

UL 

HM,TIM'DUE»R3 

• 

2.4 

DST 

Rv1,PARM2 

• 

2.6 

CL0Ctf»5 

L 

RR,TIM'NEXT,R3 

• 

2.0 

ST 

HR,T1M'NF.XT 

• 

2.2 

L. 

kkl,TlM'EVENT,R3 

• 

2.0 

JC 

EZ*CL0C10 

• 

1, 6/2.0 

sr 

RJ, PARMl 

• 

2.2 

LIM 

H1S,PARML1 

• 

1,6 

JS 

R2,5IGNL 

• 

2.0 

CLOClti* 

UL 

RB,TIM'PERI0D,H3 

• 

2.4 

DC 

RB»TIMEO 

• 

2.8 

JC 

EQ,CL0C20 

• 

1, 6/2,0 

UST 

H0, PARMl 

• 

2.6 

LI  M 

Rtb,PARML2 

• 

1.6 

JS 

r2,TSUM 

• 

2,0 

UL 

RU,PARM3 

• 

2.4 

DST 

RB,T1M'DUE,R3 

• 

2,6 

ZR 

R4 

• 

1.4 

CL0C15 

LR 

R5,R4 

• 

1.4 

L 

R4,TlM'NEXr,R5 

• 

2,0 

LIM 

RP,T1M'DIIE,R4 

• 

1.6 

ST 

RP,PARML4 

9 

2.2 

LIM 

Rt5,PARML3 

9 

1.6 

JS 

k2»TGTR 

9 

2.0 

L 

RM, PARMl 

• 

2,0 

JC 

GZ,CL0C15 

• 

1. 6/2.0 

ST 

H4,TIM'Nt,XT,R3 

• 

2.2 

ST 

R3rTlM'NEXT,R5 

• 

2.2 

CL0C2(tf 

L 

r3, riM'NEXT 

• 

2.0 

DL 

RR,TlM'OliE»R3 

• 

2.4 

DC 

ROrPARM2 

• 

2,8 

JC 

EQ»CLOC«5 

• 

1. 6/2.0 

DST 

RO#PARM2 

9 

2,6 

LIM 

R1b,PARML2 

9 

1.6 

JS 

P2,SETCL0CK 

9 

2.0 

L 

KO, PARMl 

9 

2.0 

JC 

EZ,CLOCR5 

9 

1. 6/2.0 

I.M 

R5, CLOCSV 

• 

5.8 

J 

/,K2 

• 

2.0 

PAHMLl 

CONSTAMT  PARMl 

PARML2 

constant  PARM2 

constant  PARMl 

PARNL3 

CONSTANT  PARM3 

92,0  EACH  ENTRY 

PARML4 

STORAGE  1 

60. b EACH  CLOCK  M/SAME  TIME 

constant  PARMl 

AS  PREVIOUS 

EVEN 

14,6  EACH  CLOCK  ON  QUEUE 

PARMl 

STOWAGE  2 

N/EARLIER  TINE 

PAKH2 

STORAGE  2 

PARM3 

STORAGE  2 

CLOCSV 

STORAGE  6 

A-2 


SL'NU 

StNPt^S 

NTURKU 

NTUKRA 

SEND1(i» 

SEND15 

NTAKRU 

MRUOYN 

SEND2t> 


QUALITY  PRACy^ 


MIS  PADS  'S 
DtOll  CO^'i  W UU<j 


A.  2 SKND  - TUNED  P'OR  HOI,  INEFFICIENCY 


tou 

$ 

1. 

«!■>,«, R15 

• 

1. 

H IrV'.RlS 

• 

2.** 

vie 

K/.SKNDJb 

• 

1 .o/2.*» 

HvST 

K^fSr.NUSV 

• 

2,0 

JvS 

H2,'.MSABL 

• 

2.*> 

i 

K^*  MCW 

• 

2.B 

AIM 

H2»-/ 

• 

I ,P 

ST 

H2,  mcw 

• 

2.2 

('I, 

KH,Cw'RCV,RJ 

• 

2,4 

DST 

KO,C«RCV,Ri 

• 

2,0 

L 

kis,cw'typ,rj 

• 

2.B 

L 

Kl  b/ JMPTBL/RI5 

• 

2.R 

J 

i>,  K 1 S 

• 

2.B 

A 

RB, URPISA 

• 

2. <4 

ST 

MH,CwRrV,H2 

• 

2.2 

IM 

URPISA 

• 

3.2 

L 

KtbfUSTSA 

• 

2.0 

C IM 

RIS, IUPTSA 

0 

2,0 

JC 

Gt,SKNl)Ib 

• 

1 .0/2.0 

AR 

R1  ,R15 

• 

1.4 

ST 

HI  ,CwTRA,Ri 

• 

2.2 

IM 

IISTSA 

• 

3.2 

L 

Kl  ,MRUn 

• 

2,0 

JC 

(;z»srNDU> 

• 

i ,0/2,0 

AlH 

RlS*HTSAP2>MTSAPt 

• 

1,0 

L 

RB,CWDADK,R) 

• 

2,0 

ST 

kb,«tsapi ,hlS 

• 

2,2 

J 

MOTUY 

• 

2,0 

AIM 

K2, 2 

• 

l.e> 

ST 

K2,Mc.* 

• 

2,2 

L 

R1  ,cwoptk 

• 

2,0 

ST 

k3,CNU,Rl 

• 

2,2 

IR 

CWJPTK 

• 

3,2 

2R 

R 3 

• 

1,4 

J 

^OTI^y 

• 

2,0 

A 

RO,UKPlvSA 

• 

2,0 

ST 

m*,CWRCV,H2 

• 

2,2 

IM 

URPISA 

• 

3,2 

J 

NOTIFY 

• 

2,0 

L 

RtS.UOR.SA 

• 

2,0 

AK 

RP,  RIS 

• 

1.4 

ST 

KP,CwRCV,R2 

• 

2.2 

L 

KU, MHuri 

t 

2,0 

JC 

K/,SKNl)2») 

• 

1, 0/2,0 

A 1“ 

RIb»MRSAP2-MHSAPl 

• 

1.0 

L 

Kt>,MR5APl  ,Ki  S 

• 

2.0 

ST 

K**,C^'DA0R,R.3 

• 

2.2 

IM 

IIDRSA 

• 

3.2 

J 

NOTIFY 

• 

2.0 

A-  I 
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A.  I SEND  - TUNED  FOR  HOL  INEFFICIENCY  (Continued) 


MKUSTA 

L 

nn.USRSA 

AH 

PPI.HIS 

ST 

Hk'.C*JPCV, 

IM 

llSPSA 

1.4 

^.2 

3.2 


SEM025 

NOTIFY 


L 

HI .MPUFl 

• 

2.0 

JC 

GZ.SEND2S 

• 

1. 6/2.0 

AIM 

Htb,MPSAp2-MKSAPl 

• 

1.6 

ii 

HF#CW*DAIJP.P3 

• 

2.0 

ST 

PO.MPSAPi.HlS 

• 

2.2 

L 

Ktf.CN'COllE.Ri 

• 

2.0 

JC 

F.Z,SEND3P 

• 

1. 6/2.0 

li 

P2,MN1NDX 

• 

2.0 

1 

Hi  ,CWDAI>R,P3 

# 

2.H 

osr 

HO, MNC0DE,P2 

• 

2.6 

AIM 

P2.2 

• 

1.6 

ANI)M 

R7, MNMAX 

• 

1.6 

sr 

k2, "NINOX 

• 

2.2 

JS 

h2,F'^AHLK 

• 

■2.0 

L 

H3,C.J'NXr.K3 

• 

2.0 

JC 

MFZ.SENDPS 

• 

1. 6/2,0 

Dl,  P2 

.SENDSV 

. 2.4 

SENl}35 

J 0. 

K2 

. 2.0 

* 

EVEN 

SENDSV 

STOPAGb, 

2 

41. «) 

US  EC 

ALLOCATED  ENTRY 

constant 

mTUPRU 

28,6 

US  EC 

ALLOC  ADDITION  CN 

JMPTBL 

CONSTANT 

HTUPHA 

64,2 

US  EC 

unallocated  entry 

CON ST An  r 

mtakru 

52. k> 

USEC 

UNALLOC  ADDIT  CN 

constant 

NOTIFY 

11,6 

USKC 

NOTIFICATION 

COnstan  r 

mpudyn 

CONS  tan  r 

MHUSTA 

constan  r 

NOTIFY 

A-4 


cr 
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A.  3 QUEUE /DQUEUE  - TUNED  FOR  HOL  INEFFICIENCY 


QUEUE 

EOU 

$ 

L 

Klb»0,Rl5 

• 

2.© 

1 

RlS/M,klS 

• 

2,0 

L 

Kl »CXCU 

• 

2.0 

ST 

R1»EXC'NEXC»H15 

• 

2.2 

ST 

K15,EXCU 

• 

2.2 

J 

• 

2.0 

12.4  USEC 

DQUEUK. 

EOU 

S 

ST 

R?,00(IESV 

• 

2.2 

DULUOP 

JS 

K2,l)tSABL 

• 

2.0 

L 

R1 »EXCU 

• 

2,0 

JC 

EZf  DUIIEMS 

• 

1, 6/2.0 

L 

HiS,EXC'NEXC.Rl 

• 

2.0 

ST 

RW/EXCQ 

• 

2.2 

LIM 

R0,EXC'PRC»RI 

• 

1.6 

ST 

RM, PARMLI 

• 

2.2 

LIM 

Klb,PAKMLl 

« 

1.6 

JS 

R2,CEXCPKC 

• 

2,0 

i) 

DOLOOP 

• 

2.0 

DQUEP5 

JS 

R2,OSPrSK 

• 

2.0 

J1 

DOUESV 

OQUESV 

STORAGE  1 

10.2 

usee 

PLUS 

PARMLi 

STORAGE  1 

15.2 

USEC 

EACH  CEXCPRC  CALL 

A-5 
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A. 

4 CLOCKQ  - TUNED  FOR  EXCESS  GENERALITY 

CLOCKU 

F.on 

$ 

STM 

RS, CLOCSV 

. 5.4 

JS 

K2#EnABLE 

. 2.0 

L 

H3,TIM'NEXT 

. 2.0 

DL 

PP,TIM'0UE»R3 

. 2.4 

bST 

KOf PARM2 

. 2.b 

CLOCM5 

L 

RP,TIM»NEXT,R3 

. 2.0 

ST 

R0,TIM'NEXT 

. 2.2 

L 

H0,TIM'EVENT,R3 

. 2.0 

JC 

EZrCLOClO 

. 1. 6/2.0 

ST 

R0, PARMl 

. 2.2 

LIM 

Rib, PARMLI 

. 1.6 

JS 

H 2, SIGNL 

. 2.0 

CLOC10 

LIM 

R0,TIM'PERIOD,R3 

. 1.6 

ST 

R0,PARML5 

. 2.2 

LIM 

R]b,PARML2 

. 1.6 

JS 

K2,TSUM 

. 2.0 

DL 

R0,PARM3 

. 2.4 

DST 

R0,TIM'DUE,R3 

. 2.6 

P4 

. 1.4 

CLOC15 

LR 

R5,R4 

. 1.4 

L 

R4,T1M'NEXT,R5 

. 2,0 

LIM 

k0,TlM'DUE,P4 

. 1.6 

ST 

R0,PARML4 

. 2.2 

LIM 

Rlb,PARML3 

, 1.6 

JS 

R2,TGTR 

. 2.0 

L 

K0, PARMl 

, 2.0 

JC 

GZ,CLOClb 

. 1. 6/2.0 

ST 

R4,TIM'NtXT,R3 

. 2.2 

ST 

R3,T1M''NEXT,R5 

. 2.2 

CLOC20 

L 

R3,TIM'NEX1 

. 2.0 

DL 

R0,T1M'DUE,H3 

. 2.4 

DC 

R0,PARM2 

. 2.8 

JC 

EQ,CLOC0b 

. 1. 6/2.0 

DST 

P0,PARM2 

. 2.6 

LIM 

K15,PARML2 

. 1.6 

JS 

R2,SETCLOCK 

. 2.0 

L 

R0, PARMl 

. 2.0 

JC 

EZ,CLOC05 

. 1. 6/2.0 

LM 

H5, CLOCSV 

. 5.8 

J 

0 , R2 

. 2.0 

PARMLi 

constant  PARMl 

PARMb2 

CONSTANT  PAR'12 

PARML5 

STORAGE  1 

PARML3 

constant  PARMl 

86.4  EACH  entry 

PARML4 

STORAGE  1 

S5.0  EACH  CLOCK  N/SAME  TIME 

CONSTANT  PARMl 

AS  PREVIOUS 

EVEN 

14.6  EACH  CLOCK  ON  UUEUE 

PARMl 

STORAGE  2 

W/EARLIER  TIME 

PARM2 

storage  2 

PARM3 

STORAGE  3 

CLOCSV 

STORAGE  6 

A-6 


A.  5 SEND  - TUNED  FOR  EXCESS  GENERALITY 


SEND 

EOU 

$ 

1. 

RI5.0»R15 

L 

RJ,0,R15 

JC 

E2.SEND25 

PST 

K2,SFNDSV 

SEND05 

JS 

R2,niSABL 

L 

R2,mcin 

AIM 

R2,-2 

ST 

R2,MCW 

l>l. 

»M,rw'RCV,R3 

DST 

R0,C.«RCV,R2 

L 

Rl!>,CW'TYP,RJ 

JC 

NE2, NOTIFY 

L 

K 15.  LISTS  A 

CIM 

H15. lUDTSA 

JC 

LT.SeNO10 

AIM 

82.2 

ST 

R2.MCW 

L 

Kl .CWOPTR 

ST 

R3.CWO.R1 

IM 

CWUPIR 

2H 

H3 

J 

notify 

SENO10 

AR 

Ml .R15 

ST 

R1 .CWTRA.R2 

IM 

IISTSA 

L 

Rl.MHIlFl 

JC 

GZ. SEND 15 

AIM 

R15.MTSAP2-M1-SAP1 

SEND  15 

L 

K0.CW'DADR.R3 

ST 

K0.MTSAP1.R15 

NOTIFY 

L 

RO.CW'CODE.BJ 

JC 

EZ.SENO20 

L 

R2.MNINOX 

L 

Pt .CW'OAOR.K3 

DST 

R0.MNCODC.R2 

AIM 

R2.2 

ANPM 

R2.«NMAX 

ST 

R2.MNINDX 

SEND20 

JS 

R2.FNAbLE 

L 

K3.CW'NXT,RJ 

JC 

NFZ,SEND05 

DL 

R2. SENDSV 

SeNU25 

J 

0.R? 

• 

F.VeN 

sendsv 

STOKAGF,  2 

1.6/2.K 

2,h 

2,M 

1.6 

2.2 

2.4 

2.t» 

2,V> 

1 .o/2.ti) 

2.t» 

2.0 

1. 6/2,0 
1.6 
2.2 

2.0 
2.2 

3.2 

1.4 

2.0 

1.4 

2.2 

3.2 

2.0 

1. 6/2,0 
1.6 

2,0 

2.2 

2,0 

1, 6/2.0 

2.0 

2.0 

2.6 
1.6 
1.6 
2.2 

2.0 

2.0 

1, 6/2.0 

2.4 

2,0 


38.6  usee  ALLOCATED  ENTRY 

26.8  usee  ALLOC  ADDITION  CW 

59.8  usee  UNALLOCATeO  ENTRY 
48.0  usee  UNALLOC  ADOIT  CW 

11.6  USEC  NOTIFICATION 
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A.  6 CLOCKQ  - TUNED  FOR  HOL  DEFICIENCIES 


CLOCKU 

Fon 

$ 

f 

STM 

HSfCl.OCSV 

, 5.4 

KNBL 

• 

BIF 

ENABLE 

I. 

R3, riM'NEXT 

. 2.0 

OL 

KH#T1M'DUE,RJ 

, 2.4 

CLOC*’2 

UST 

HH,PARM2 

. 2.6 

CLOCKS 

L 

kkJ,  riM'NEXf  ,«3 

, 2.0 

ST 

kM,HM'NtXT 

. 2.2 

L 

kk1,TlM'EVENr,R3 

, 2.0 

vJC 

KZ.CI.OCIH 

. 1. 6/2.0 

ST 

kD,PARMl 

. 2.2 

LIM 

KlbfPARMLl 

. l.<> 

JS 

R2,SXGNL 

. 2.0 

CLOCU. 

DL 

K»»,TlM'PKRIOD»k3 

. 2.4 

DA 

kO,PARM2 

• 

BIF 

TSUM 

DST 

HB,T1M'DUE,R3 

• 2.6 

?.H 

k4 

. 1.4 

CL0C15 

LR 

kb,R4 

. 1.4 

L 

R4,TlM'NtXr»R5 

. 2.P 

DC 

R0,T1M'DUE,R4 

. 

BIF 

TGTH 

JC 

GT,CL0C15 

. 1. 6/2.0 

sr 

R4,T1M'NEXT,R3 

. 2.2 

ST 

R3,TIM'NEXT,h5 

. 2.2 

CLOC2H 

L 

R3rTIM»NEXr 

. 2.0 

DL 

Rk),TIM'OUE,R3 

. 2.4 

DC 

R0,PARM2 

. 2.8 

JC 

tU»CLOCD5 

. 1. 6/2.0 

ITB 

R2 

. 

BIF 

SETCLOCK 

SH 

R2,R1 

. 

BIF 

SETCLOCK 

JC 

GEZ,CLOC«2 

. 

BIF 

SETCLOCK 

msim 

H2f  in 

. 

BIF 

SETCLOCK 

OTA 

R2 

. 

BIF 

SETCLOCK 

LM 

R5,CLOCSV 

. 5.0 

J 

Vl,R7 

. 2.0 

• 

PARMLl 

CONSTANT 

PARMl 

58.4 

38.6 

EACH  ENTRY 

EACH  CLOCK  M/SAMC  TIME 

PARHl 

CONSTANT 

EVEN 

STOHACF 

PARM2 

2 

5.4 

AS  PREVIOUS 

EACH  CLUCK  ON  QUEUE 
0/EARLIER  TIME 

PARM2 

CLOCSV 

STORAGE 

STORAGE 

2 

6 
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A.  7 SEND  - TUNED  FOR  HOL  DEFICIENCIES 


SEND 

eou 

$ 

L 

R1 5.0.H15 

L 

Rt5.0*Rl5 

JC 

EZ.SEND2S 

ST 

H2f SKNDSV 

SEND0'> 

DSDL 

L 

R2/MC«i 

AIM 

R2.-2 

ST 

R2,MCW 

DL 

R0,CW'RCV»K15 

L'Sr 

R^.C«iRCV.R2 

L 

K0,C0'TYP,R15 

JC 

NEZ.NOni  Y 

L 

H2.USTSA 

CIM 

K2. lUDTSA 

JC 

LT. SEND  10 

AIM 

h2.2 

ST 

r2,mcn 

L 

R1 .CmOPTR 

ST 

Ht  5.CWU,R1 

IM 

CNOPfR 

ZR 

H15 

J 

NOTIFY 

SENDIB 

AR 

R1  .R0 

ST 

Hi ,CWTRA.K2 

IM 

IlSTSA 

LH 

Kl  ,»0 

L 

H0,MBUF1 

JC 

GZ, SEND  15 

AIM 

R1,MTSAP2-MTSAP1 
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Appendix  B 

TUNING  THE  DAIS  MISSION  DFG 


The  formal  Dais  mission  DFG  was  tuned  by  systematic  application 
of  the  techniques  discussed  in  detail  in  Section  1.2.2.  The  major  stages 
in  the  tuning  process  in  order  of  application  were; 

1)  Combining  tasks  and  simplifying  complex  constructs. 

2)  Partitioning  the  DFG  among  processors. 

3)  Preemption  limiting. 

In  Stage  1.  tasks  which  were  closely  related  by  their  activation 
conditions  were  combined  into  larger  tasks.  Complexity  was  reduced  by 
absorbing  most  of  the  control  selector  nodes  into  the  task  combinations. 

The  DFG  was  then  redrawn  to  show  the  combined  tasks  and  their  inter- 
relations. 

In  Stage  2,  the  DFG  was  partitioned  into  two  disjoint  parts.  The 
partitioning  chosen  was  based  on  the  functionality  of  each  processor  load, 
the  space  occupied  by  the  tasks,  the  processor  time  required  by  each  task, 
the  bus  load  imposed  by  links  connecting  the  partitions. 

In  Stage  3,  the  preemption  limiting  technique  was  applied  to  each 
of  the  DFG  partitions.  No  preemption  was  allowed  in  processor  1 while  a 
limited  amount  of  preemption  was  allowed  in  processor  0.  The  limitations 
on  preemption  avoid  contention  problems  for  most  of  the  data  selector 
storage  nodes  in  both  partitions.  The  resulting  DFG  was  drawn  showing 
only  the  data  selector  storage  nodes  which  still  required  executive  manage- 
ment after  preemption  limitation. 


j 
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Appendix  C 

ASSIGNMENT  OF  BUS  SUB  ADDRESSES 

The  strategy  for  assignment  of  subaddresses  has  been  designed 
to  maximize  efficiency  by  using  fixed  subaddress  allocations  for  the  most 
frequently  transmitted  messages  and  dynamic  allocation  for  the  numerous 
less  frequently  sent  messages. 

At  any  given  point  in  time,  exactly  one  processor  is  in  master 
mode  (i.  e,  , in  control  of  the  bus)  and  all  others  are  in  remote  mode.  Each 
processor  has  three  tables  of  subaddresscs  containing  the  addresses  of 
data  areas  for  the  transmittal  and  receipt  of  messages.  These  tables  are; 

1)  Master  transmit  - subaddresses  for  data  to  be 
sent  to  other  processors  and  remote  terminals 
while  in  master  mode. 

2)  Master  receive  - subaddresses  for  data  received 
from  remote  terminals  (excluding  other  processors) 
while  in  master  mode. 

3)  Remote  receive  - subaddresses  for  data  received 
from  other  processors  while  in  remote  mode. 

For  a given  processor  each  active  sink  and  outbound  link  requires 
a master  transmit  subaddress,  each  active  source  requires  a master 
receive  subaddress,  and  each  active  inbound  data  link  requires  a remote 
receive  subaddress.  Subaddresses  for  any  of  these  cases  may  be  either 
pre -allocated  and  fixed  or  dynamically  allocated  and  variable.  Since 
dynamic  allocation  takes  more  time,  frequently  transmitted  or  received 
messages  are  pre -allocated  fixed  subaddresses  beginning  with  subaddress 
1.  The  remaining  subaddresses  belong  to  a pool  which  are  allocated  on 
a first-come -first-serve  basis.  Dynamically  allocated  remote  receive 
subaddresses  are  allocated  by  the  transmitting  master  processor. 

In  any  of  these  cases,  buffers  for  transmission  or  receipt  of  data 
may  be  dynamically  or  statically  allocated.  Dynamic  allocation  is  used 
for  remote  receive  subaddresses  for  instances  in  which  the  time  of  re- 
ceipt is  unknown,  that  is  while  one  copy  of  received  data  is  in  use,  another 
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copy  may  be  received.  In  these  cases  a new  buffer  is  allocated  for 
each  subaddress  where  data  has  been  received  when  bus  control  is  trans- 
ferred to  another  processor.  All  remote  receive  subaddresses  which 
are  dynamically  allocated  also  make  use  of  dynamically  allocated  buffers. 

As  mentioned  atjove,  the  preferred  n^elhod  of  subaddress  assign- 
ment is  pre -allocation.  In  sonic  cases  it  is  not  possible  to  pre-allocate 
subaddresses  for  all  messages  in  a given  class  because  of  the  limited 
number  of  subaddresses  available  (30  not  including  the  subaddress  used 
for  transfer  of  control),  A case  in  point  is  the  master  transmit  sub- 
addresses for  processor  0,  Processor  0 contains  many  sinks  and  out- 
bound data  links  (more  than  70).  The  16  most  frequently  transmitted 
messages  have  been  pre -allocated  subaddresses.  The  remaining  mes- 
sages compete  for  the  remaining  subaddressi-s.  It  is  possible  in  this 
case  that  by  coincidence  enough  transmission  requests  occur  in  a short 
period  of  time  (since  the  last  time  the  processor  was  master)  to  consume 
all  the  available  master  transmit  subaddresses.  When  this  happens, 
subroutine  SEND  queues  the  excess  requests  for  execution  the  next  time 
the  processor  becomes  master. 

The  following  pages  detail  the  subaddress  assignments  used  in 


PROCESSOR  0 SUBADDRESS  ASSIGNMENTS 


MTSAP  Notify  Code 

1. 

2. 

3. 

4. 

5. 

6. 

7. 

8. 

9. 

10. 

11. 

12. 

13. 

14. 

15. 

16. 

17. 

18. 

19. 

20. 

21. 

22. 

23. 

24. 

25. 

26. 

27. 

28. 

29. 

30. 

31. 


MASTER  TRANSMIT 


D23 

D24 

D26 

D27 

D41 

D42 

D43 

D44 

D46 

054 

D81 

D82 

D48 

DSS57A,  DSS57B,  DSS57C 
DSS19(,38) 
DSS57(,58,61) 


DSS37 


PRIL 


ND24 


Preallocated  ND27 
Subaddresses 

ND42 


ND82 

ND48 

N0SS57A,  NDSS57B,  NDSS57C 
NDSS19 


lUSTSA 

i 


Pool  of 
Dynamically 
Allocated 
Subaddresses 


lUDTSA 


C-3 


MTSAP 


Notify  Code 


040 

045 

045A 

050 

051 
055 

079 

080 

085 

086 
095 
09  5 A 

05536,64 

05521 

05526 

D5526A 

05502 

05502A 

0551 7A 

0551 7B 

045B 

EG65 

0G65 

EG35 

0G35 

R05543 

RD5552 


NDS526A 


UNALLOCATEO  MA5TER  TRAN5MIT 


Page  2 of  2 


PROCESSOR  0 SUBADDRESS  ASSIGNMENTS 


MRSAP 

1. 

DOl,  D02.  D03 

2. 

D04 

3. 

014.  087 

4. 

D16 

5. 

D17 

6, 

018 

7. 

019 

8. 

020 

9. 

021 

10. 

029 

11. 

030 

12. 

031 

13. 

032 

14. 

i 033 

15. 

034 

16. 

035 

17. 

047 

18. 

1 

049 

19. 

056 

20. 

058 

21. 

088 

22. 

23. 

24. 

25. 

26. 

27. 

28. 

29. 

30. 

DUMSTDR 

31. 

Notify  Code 
ND01 

N014 


Statically  ND20 

Allocated 

Subaddresses 

ND32 


ND47 

ND49 

ND56 

ND58 

if  ND88 

lUDRSA.IUSRSA 


MASTER  RECEIVE 


C-6 


PROCESSOR  0 SUBADDRESS  ASSIGNMENTS 


RRSAP 


Notify  Code 


A NDSS43 

Preallocated  NDSS52 
Subaddresses  NDSS11 
Statically 
Allocated  NDOl 


Buffers 

RRIMAX 


NFIN 


RRADYN.RRUDYN 


REMOTE  RECEIVE 


PROCESSOR  1 SUBADORESS  ASSIGNMENTS 


DSS43 

DSS52 

ssn 

05? 


] 

2 

9 

0 

1 

2 

3 

4 

3 

4 

5 

6 

7 

8 

RIN 


(L 


1 


Notify  Code 

1 

Preailocated 
Subaddresses 

ND84 

ND53  • 


f 


lUSTSA.IUOTSA 


MASTER  TRANSMIT 


C-8 
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